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NUCLEIC ACIDS AND PROTEINS FROM STREPTOCOCCUS PNEUMONIAE 

The present invention relates to proteins derived from Streptococcus pneumoniae, 
nucleic acid molecules encoding such proteins, the use of the nucleic acid and/or 
proteins as antigens/inununogens and in detection/diagnosis, as well as methods for 
screening the proteins/nucleic acid sequences as potential anti-microbial targets. 

Streptococcus pneumoniae, commonly referred to as the pneumococcus, is an 
important pathogenic organism. The continuing significance of Streptoccocus 
pneumoniae infections in relation to human disease in developing and developed 
countries has been authoritatively reviewed (Fiber, G.R., Science, 265: 1385-1387 
(1994)). That indicates that on a global scale this organism is believed to be the 
most common bacterial cause of acute respiratory infections, and is estimated to 
result in 1 million childhood deaths each year, mostly in developing countries 
(Stansfield, S.K., Pediatr. Infect. Dis., 6: 622 (1987)). In the USA it has "been 
suggested (Breiman et al, Arch. Intern. Med., 150: 1401 (1990)) that the 
pneumococcus is still the most common cause of bacterial pneumonia, and that 
disease rates are particularly high in young children, in the elderly, and in patients 
with predisposing conditions such as asplenia, heart, lung and kidney disease, 
diabetes, alcoholism, or with immunosupressive disorders, especially AIDS. These 
groups are at higher risk of pneumococcal septicaemia and hence meningitis and 
therefore have a greater risk of dying from pneumococcal infection. The 
pneumococcus is also the leading cause of otitis media and sinusitis, which remain 
prevalent infections in children in developed countries, and which incur substantial 
costs. 

The need for effective preventative strategies against pneumococcal infection is 
highlighted by the recent emergence of penicillin-resistant pneumococci. It has been 
reported that 6.6% of pneumoccal isolates in 13 US hospitals in 12 states were found 
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to be resistant to penicillin and some isolates were also resistant to other antibiotics 
including third generation cyclosporins (Schappert, S.M. t Vital and Health Statistics 
of the Centres for Disease Control/National Centre for Health Statistics, 214:1 
(1992)). The rates of penicillin resistance can be higher (up to 20%) in some 
hospitals (Breiman et al 9 J. Am. Med. Assoc., 271: 1831 (1994)). Since the 
development of penicillin resistance among pneumococci is both recent and sudden, 
coming after decades during which penicillin remained an effective treatment, these 
findings are regarded as alarming. 

For the reasons given above, there are therefore compelling grounds for considering 
improvements in the means of preventing, controlling, diagnosing or treating 
pneumococcal diseases. 

Various approaches have been taken in order to provide vaccines for the prevention 
of pneumococcal infections. Difficulties arise for instance in view of the variety of 
serotypes (at least 90) based on the structure of the polysaccharide capsule 
surrounding the organism. Vaccines against individual serotypes are not effective 
against other serotypes and this means that vaccines must include polysaccharide 
antigens from a whole range of serotypes in order to be effective in a majority of 
cases. An additional problem arises because it has been found that the capsular 
polysaccharides (each of which determines the serotype and is the major protective 
antigen) when purified and used as a vaccine do not reliably induce protective 
antibody responses in children under two years of age, the age group which suffers 
the highest incidence of invasive pneumococcal infection and meningitis. 

A modification of the approach using capsule antigens relies on conjugating the 
polysaccharide to a protein in order to derive an enhanced immune response, 
particularly by giving the response T-cell dependent character. This approach has 
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been used in the development of a vaccine against Haemophilus influenzae, for 
instance. There are, however, issues of cost concerning both the multi- 
poly saccharide vaccines and those based on conjugates. 

A third approach is to look for other antigenic components which offer the potential 
to be vaccine candidates. This is the basis of the present invention. Using a specially 
developed bacterial expression system, we have been able to identify a group of 
protein antigens from pneomococcus which are associated with the bacterial 
envelope or which are secreted. 

Thus, in a first aspect the present invention provides a Streptococcus pneumoniae 
protein or polypeptide having a sequence selected from those shown in table 1. 

In a second aspect, the present invention provides a Streptococcus pneumoniae 
protein or polypeptide having, a sequence selected from those shown in table 2. 

A protein or polypeptide of the present invention may be provided in substantially pure 
form. For example, it may be provided in a form which is substantially free of other 
proteins. 

As discussed herein, the proteins and polypeptides of the invention are useful as 
antigenic material. Such material can be "antigenic" and/or "immunogenic". 
Generally, "antigenic" is taken to mean that the protein or polypeptide is capable of 
being used to raise antibodies or indeed is capable of inducing an antibody response in 
a subject. "Immunogenic" is taken to mean that the protein or polypeptide is capable of 
eliciting a protective immune response in a subject. Thus, in the latter case, the protein 
or polypeptide may be capable of not only generating an antibody response but, in 
addition, a non-antibody based immune response. 
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The skilled person will appreciate that homologues or derivatives of the proteins or 
polypeptides of the invention will also find use in the context of the present invention, 
ie as antigenic/immunogenic material. Thus, for instance proteins or polypeptides 
which include one or more additions, deletions, substitutions or the like are 
encompassed by the present invention. In addition, it may be possible to replace one 
amino acid with another of similar "type". For instance replacing one hydrophobic 
amino acid with another. 

One can use a program such as the CLUSTAL program to compare amino acid 
sequences. This program compares amino acid sequences and finds the optimal 
alignment by inserting spaces in either sequence as appropriate. It is possible to 
calculate amino acid identity or similarity (identity plus conservation of amino acid 
type) for an optimal alignment. A program like BLASTx will align the longest stretch 
of similar sequences and assign a value to the fit. It is thus possible to obtain a 
comparison where several regions of similarity are found, each having a different 
score. Both types of identity analysis are contemplated in the present invention. 

In the case of homologues and derivatives, the degree of identity with a protein or 
polypeptide as described herein is less important than that the homoiogue or derivative 
should retain the antigenicity or immunogenicity of the original protein or polypeptide. 
However, suitably, homologues or derivatives having at least 60% similarity (as 
discussed above) with the proteins or polypeptides described herein are provided. 
Preferably, homologues or derivatives having at least 70% similarity, more preferably 
at least 80% similarity are provided. Most preferably, homologues or derivatives 
having at least 90% or even 95% similarity are provided. 

In an alternative approach, the homologues or derivatives could be fusion proteins, 
incorporating moieties which render purification easier, for example by effectively 
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lagging the desired protein or polypeptide. It may be necessary to remove the "tag" or 
it may be the case that the fusion protein itself retains sufficient antigenicity to be 
useful. 

In an additional aspect of the invention there are provided antigenic/immunogenic 
fragments of the proteins or polypeptides of the invention, or of homologues or 
derivatives thereof. 

For fragments of the proteins or polypeptides described herein, or of homologues or 
derivatives thereof, the situation is slightly different. It is well known that is possible to 
screen an antigenic protein or polypeptide to identify epitopic regions, ie those regions 
which are responsible for the protein or polypeptide's antigenicity or immunogenicity. 
Methods for carrying out such screening are well known in the art. Thus, the fragments 
of the present invention should include one or more such epitopic regions or be 
sufficiently similar to such regions to retain their antigenic/immunogenic properties. 
Thus, for fragments according to the present invention the degree of identity is perhaps 
irrelevant, since they may be 100% identical to a particular part of a protein or 
polypeptide, homologue or derivative as described herein. The key issue, once again, is 
that the fragment retains the antigenic/immunogenic properties. 

Thus, what is important for homologues, derivatives and fragments is that they possess 
at least a degree of the antigenicity/immunogenicity of the protein or polypeptide from 
which they are derived. 

Gene cloning techniques may be used to provide a protein of the invention in 
substantially pure form. These techniques are disclosed, for example, in J. Sambrook 
et al Molecular Cloning 2nd Edition, Cold Spring Harbor Laboratory Press (1989). 
Thus, in a third aspect, the present invention provides a nucleic acid molecule 
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comprising or consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 1 or their RNA equivalents; 

(ii) a sequence which is complementary to any of the sequences of (i); 

(iii) a sequence which codes for the same protein or polypeptide, as those 
sequences of (i) or (ii); 

(iv) a sequence which has substantial identity with any of those of (i), (ii) 
and (iii); 

(v) a sequence which codes for a homologue, derivative or fragment of a 
protein as defined in Table 1 . 

In a fourth aspect the present invention provides a nucleic acid molecule comprising or 
consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 2 or their RNA equivalents; 

(ii) a sequence which is complementary to any of the sequences of (i); 

(iii) a sequence which codes for the same protein or polypeptide, as those 
sequences of (i) or (ii); 

(iv) a sequence which has substantial identity with any of those of (i), (ii) 
and (iii); or 
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(v) a sequence which codes for a homologue, derivative or fragment of a 
protein as defined in Table 2. 

The nucleic acid molecules of the invention may include a plurality of such sequences, 
and/or fragments. The skilled person will appreciate that the present invention can 
include novel variants of those particular novel nucleic acid molecules which are 
exemplified herein. Such variants are encompassed by the present invention. These 
may occur in nature, for example because of strain variation. For example, additions, 
substitutions and/or deletions are included. In addition, and particularly when utilising 
microbial expression systems, one may wish to engineer the nucleic acid sequence by 
making use of known preferred codon usage in the particular organism being used for 
expression. Thus, synthetic or non-naturally occurring variants are also included within 
the scope of the invention. 

The term "RNA equivalent" when used above indicates that a given RNA molecule has 
a sequence which is complementary to that of a given DNA molecule (allowing for the 
fact that in RNA "U" replaces "T" in the genetic code). 

When comparing nucleic acid sequences for the purposes of determining the degree of 
homology or identity one can use programs such as BESTFIT and GAP (both from the 
Wisconsin Genetics Computer Group (GCG) software package) BESTFIT, for 
example, compares two sequences and produces an optimal alignment of the most 
similar segments. GAP enables sequences to be aligned along their whole length and 
finds the optimal alignment by inserting spaces in either sequence as appropriate. 
Suitably, in the context of the present invention when discussing identity of nucleic acid 
sequences, the comparison is made by alignment of the sequences along their whole 
length. 
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Preferably, sequences which have substantial identity have at least 50% sequence 
identity, desirably at least 75% sequence identity and more desirably at least 90 or at 
least 95% sequence identity with said sequences. In some cases the sequence identity 
may be 99% or above. 

Desirably, the term "substantial identity" indicates that said sequence has a greater 
degree of identity with any of the sequences described herein than with prior art nucleic 
acid sequences. 

It should however be noted that where a nucleic acid sequence of the present invention 
codes for at least part of a novel gene product the present invention includes within its 
scope all possible sequence coding for the gene product or for a novel part thereof. 

The nucleic acid molecule may be in isolated or recombinant form. It may be 
incorporated into a vector and the vector may be incorporated into a host. Such vectors 
and suitable hosts form yet further aspects of the present invention. 

Therefore, for example, by using probes based upon the nucleic acid sequences 
provided herein, genes in Streptococcus pneumoniae can be identified. They can then 
be excised using restriction enzymes and cloned into a vector. The vector can be 
introduced into a suitable host for expression. 

Nucleic acid molecules of the present invention may be obtained from S.pneumoniae by 
the use of appropriate probes complementary to part of the sequences of the nucleic 
acid molecules. Restriction enzymes or sonication techniques can be used to obtain 
appropriately sized fragments for probing. 
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Alternatively PCR techniques may be used to amplify a desired nucleic acid sequence. 
Thus the sequence data provided herein can be used to design two primers for use in 
PCR so that a desired sequence, including whole genes or fragments thereof, can be 
targeted and then amplified to a high degree. 

Typically primers will be at least 15-25 nucleotides long. 

As a further alternative chemical synthesis may be used. This may be automated. 
Relatively short sequences may be chemically synthesised and ligated together to 
provide a longer sequence. 

There is another group of proteins from S.pneumoniae which have been identified 
using the bacterial expression system described herein. These are known proteins 
from S.pneumoniae, which have not previously been identified as antigenic proteins. 
The amino acid sequences of this group of proteins, together with DNA sequences 
coding for them are shown in Table 3. These proteins, or homologues, derivatives 
and/or fragments thereof also find use as antigens/immunogens. Thus, in another 
aspect the present invention provides the use of a protein or polypeptide having a 
sequence selected from those shown in Tables 1-3, or homologues, derivatives 
and/or fragments thereof, as an immunogen/antigen. 

In yet a further aspect the present invention provides an immunogenic/antigenic 
composition comprising one or more proteins or polypeptides selected from those 
whose sequences are shown in Tables 1-3, or homologues or derivatives thereof, 
and/or fragments of any of these. In preferred embodiments, the 
immunogenic/antigenic composition is a vaccine or is for use in a diagnostic assay. 

In the case of vaccines suitable additional excipients, diluents, adjuvants or the like 
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may be included. Numerous examples of these are well known in the art. 

It is also possible to utilise the nucleic acid sequences shown in Tables 1-3 in the 
preparation of so-called DNA vaccines. Thus, the invention also provides a vaccine 
composition comprising one or more nucleic acid sequences as defined herein. DNA 
vaccines are described in the art (see for instance, Donnelly et al , Ann, Rev. 
Immunol, 15:617-648 (1997)) and the skilled person can use such art described 
techniques to produce and use DNA vaccines according to the present invention. 

As already discussed herein the proteins or polypeptides described herein, their 
homologues or derivatives, and/or fragments of any of these, can be used in methods 
of detecting/diagnosing S.pneumoniae. Such methods can be based on the detection 
of antibodies against such proteins which may be present in a subject. Therefore the 
present invention provides a method for the detection/diagnosis of S.pneumoniae 
which comprises the step of bringing into contact a sample to be tested with at least 
one protein, or homologue, derivative or fragment thereof, as described herein. 
Suitably, the sample is a biological sample, such as a tissue sample or a sample of 
blood or saliva obtained from a subject to be tested. 

In an alternative approach, the proteins described herein, or homologues, derivatives 
and/or fragments thereof, can be used to raise antibodies, which in turn can be used 
to detect the antigens, and hence S.pneumoniae. Such antibodies form another aspect 
of the invention. Antibodies within the scope of the present invention may be 
monoclonal or polyclonal. 

Polyclonal antibodies can be raised by stimulating their production in a suitable animal 
host (e.g. a mouse, rat, guinea pig, rabbit, sheep, goat or monkey) when a protein as 
described herein, or a homologue, derivative or fragment thereof, is injected into the 



WO 00/06738 



11 



PCT/CB99/02452 



animal. If desired, an adjuvant may be administered together with the protein. Well- 
known adjuvants include Freund's adjuvant (complete and incomplete) and aluminium 
hydroxide. The antibodies can then be purified by virtue of their binding to a protein as 
described herein. 

Monoclonal antibodies can be produced from hybridoraas. These can be formed by 
fusing myeloma cells and spleen cells which produce the desired antibody in order to 
form an immortal cell line. Thus the well-known Kohler & Milstein technique {Nature 
256 (1975)) or subsequent variations upon this technique can be used. 

Techniques for producing monoclonal and polyclonal antibodies that bind to a 
particular polypeptide/protein are now well developed in the art. They are discussed in 
standard immunology textbooks, for example in Roitt et al, Immunology second edition 
(1989), Churchill Livingstone, London. 

In addition to whole antibodies, the present invention includes derivatives thereof which 
are capable of binding to proteins etc as described herein. Thus the present invention 
includes antibody fragments and synthetic constructs. Examples of antibody fragments 
and synthetic constructs are given by Dougall et al in Tibtech 12 372-379 (September 
1994). 

Antibody fragments include, for example, Fab, F(ab') 2 and Fv fragments. Fab 
fragments (These are discussed in Roitt et al [supra] ). Fv fragments can be modified 
to produce a synthetic construct known as a single chain Fv (scFv) molecule. This 
includes a peptide linker covalently joining V h and V, regions, which contributes to the 
stability of the molecule. Other synthetic constructs that can be used include CDR 
peptides. These are synthetic peptides comprising antigen-binding determinants. 
Peptide mimetics may also be used. These molecules are usually conformationally 
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restricted organic rings that mimic the structure of a CDR loop and that include 
antigen-interactive side chains. 

Synthetic constructs include chimaeric molecules. Thus, for example, humanised (or 
primatised) antibodies or derivatives thereof are within the scope of the present 
invention. An example of a humanised antibody is an antibody having human 
framework regions, but rodent hypervariable regions. Ways of producing chimaeric 
antibodies are discussed for example by Morrison et al in PNAS, 81, 6851-6855 (1984) 
and by Takeda et al in Nature. 314, 452-454 (1985)* 

Synthetic constructs also include molecules comprising an additional moiety that 
provides the molecule with some desirable property in addition to antigen binding. For 
example the moiety may be a label (e.g. a fluorescent or radioactive label). 
Alternatively, it may be a pharmaceutical^ active agent. 

Antibodies, or derivatives thereof, find use in detection/diagnosis of S.pneumoniae. 
Thus, in another aspect the present invention provides a method for the 
detection/diagnosis of S.pneumoniae which comprises the step of bringing into contact 
a sample to be tested and antibodies capable of binding to one or more proteins 
described herein, or to homologues, derivatives and/or fragments thereof. 

In addition, so-called "Affibodies" may be utilised. These are binding proteins 
selected from combinatorial libraries of an alpha-helical bacterial receptor domain 
(Nord etal f ) Thus, Small protein domains, capable of specific binding to different 
target proteins can be selected using combinatorial approaches. 

It will also be clear that the nucleic acid sequences described herein may be used to 
detect/diagnose S.pneumoniae. Thus, in yet a further aspect, the present invention 
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provides a method for the detection/diagnosis of S. pneumoniae which comprises the 
step of bringing into contact a sample to be tested with at least one nucleic acid 
sequence as described herein. Suitably, the sample is a biological sample, such as a 
tissue sample or a sample of blood or saliva obtained from a subject to be tested. 
Such samples may be pre-treated before being used in the methods of the invention. 
Thus, for example, a sample may be treated to extract DNA. Then, DNA" probes 
based on the nucleic acid sequences described herein (ie usually fragments of such 
sequences) may be used to detect nucleic acid from S.pneumoniae. 

In additional aspects, the present invention provides: 

(a) a method of vaccinating a subject against S.pneumoniae which comprises the 
step of administering to a subject a protein or polypeptide of the invention, or a 
derivative, homologue or fragment thereof, or an immunogenic composition of the 
invention; 

(b) a method of vaccinating a subject against S.pneumoniae which comprises the 
step of administering to a subject a nucleic acid molecule as defined herein; 

(c) a method for the prophylaxis or treatment of S.pneumoniae infection which 
comprises the step of administering to a subject a protein or polypeptide of the 
invention, or a derivative, homologue or fragment thereof, or an immunogenic 
composition of the invention; 

(d) a method for the prophylaxis or treatment of S.pneumoniae infection which 
comprises the step of administering to a subject a nucleic acid molecule as defined 
herein; 
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(e) a kit for use in detecting/diagnosing S. pneumoniae infection comprising one 
or more proteins or polypeptides of the invention, or homologues, derivatives or 
fragments thereof, or an antigenic composition of the invention; and 

(0 a kit for use in detecting/diagnosing S.pneumoniae infection comprising one 
or more nucleic acid molecules as defined herein. 

Given that we have identified a group of important proteins , such proteins are 
potential targets for anti-microbial therapy. It is necessary, however, to determine 
whether each individual protein is essential for the organism's viability. Thus, the 
present invention also provides a method of determining whether a protein or 
polypeptide as described herein represents a potential anti-microbial target which 
comprises antagonising, inhibiting or otherwise interfering with the function or 
expression of said protein and determining whether S.pneumoniae is still viable. 

A suitable method for inactivating the protein is to effect selected gene knockouts, ie 
prevent expression of the protein and determine whether this results in a lethal 
change. Suitable methods for carrying out such gene knockouts are described in Li 
et al , P.N.A.S., 94:13251-13256 (1997) and Kolkman et al , 178:3736- 
3741 (1996). 

In a final aspect the present invention provides the use of an agent capable of 
antagonising, inhibiting or otherwise interfering with the function or expression of a 
protein or polypeptide of the invention in the manufacture of a medicament for use in 
the treatment or prophylaxis of S.pneumoniae infection. 

As mentioned above, we have used a bacterial expression system as a means of 
identifying those proteins which are surface associated, secreted or exported and 
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thus, would find use as antigens. 

The information necessary for the secretion/export of proteins has been extensively 
studied in bacteria. In the majority of cases, protein export requires a signal peptide 
to be present at the N-terminus of the precursor protein so that it becomes directed to 
the translocation machinery on the cytoplasmic membrane. During or after 
translocation, the signal peptide is removed by a membrane associated signal 
peptidase. Ultimately the localization of the protein (i.e. whether it be secreted, an 
integral membrane protein or attached to the cell wall) is determined by sequences 
other than the leader peptide itself. 

We are specifically interested in surface located or exported proteins as these are 
likely to be antigens for use in vaccines, as diagnostic reagents or as targets for 
therapy with novel chemical entities. We have therefore developed a screening 
vector-system in Lactococcus lactis that permits genes encoding exported proteins to 
be identified and isolated. We provide below a representative example showing how 
given novel surface associated proteins from Streptococcus pneumoniae have been 
identified and characterized. The screening vector incorporates the staphylococcal 
nuclease gene nuc lacking its own export signal as a secretion reporter. 
Staphylococcal nuclease is a naturally secreted heat-stable, monomeric enzyme 
which has been efficiently expressed and secreted in a range of Gram positive 
bacteria (Shortle, Gene, 22:181-189 (1983); Kovacevic etaL, 7. BacterioL, 
162:521-528 (1985); Miller etaL, 7. BacterioL, 169:3508-3514 (1987); Liebl etaL, 
7. BacterioL, 174:1854-1861 (1992); Le Loir et aL, 7. Bacterial., 176:5135-5139 
(1994); Poquet et aL t 7. BacterioL, 180:1904-1912 (1998)). 



Recently, Poquet et al. ((1998), supra) have described a screening vector 
incorporating the nuc gene lacking its own signal leader as a reporter to identify 
exported proteins in Gram positive bacteria, and have applied it to L. lactis. This 
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vector (pFUN) contains the pAM(31 repiicon which functions in a broad host range 
of Gram-positive bacteria in addition to the ColEl repiicon that promotes replication 
in Escherichia coli and certain other Gram negative bacteria. Unique cloning sites 
present in the vector can be used to generate transcriptional and translational fusions 
between cloned genomic DNA fragments and the open reading frame of the 
truncated nuc gene devoid of its own signal secretion leader. The nuc gene makes an 
ideal reporter gene because the secretion of nuclease can readily be detected using a 
simple and sensitive plate test: Recombinant colonies secreting the nuclease develop 
a pink halo whereas control colonies remain white (Shortle, (1983), supra; Le Loir 
etai, (1994), supra). 

Thus, the invention will now be described with reference to the following 
representative example, which provides details of how the proteins, polypeptides and 
nucleic acid sequences described herein identified as antigenic targets. 

We describe herein the construction of three reporter vectors and their use in L. 
lactis to identify and isolate genomic DNA fragments from Streptococcus 
pneumoniae encoding secreted or surface associated proteins. 

The invention will now be described with reference to the examples, which should 
not be construed as in any way limiting the invention. The examples refer to the 
figures in which: 

Figure 1: shows the results of a number of DNA vaccine trials; and 
Figure 2: shows the results of further DNA vaccine trials. 
EXAMPLE 1 



(i) Construction of the pTREPl-nuc series of reporter vectors 
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(a) Construction of expression plasmid pTREPl 

The pTREPl plasmid is a high-copy number (40-80 per cell) theta-replicating gram 
positive plasmid, which is a derivative of the pTREX plasmid which is itself a 
5 derivative of the previously published pIL253 plasmid. pIL253 incorporates the 

broad Gram-positive host range replicon of pAMpi (Simon and Chopin, Biochimie, 
70:559-567 (1988)) and is non-mobilisable by the L laais sex-factor. pIL253 also 
lacks the tra function which is necessary for transfer or efficient mobilisation by 
conjugative parent plasmids exemplified by pIL501. The Enterococcal pAMpi 

10 replicon has previously been transferred to various species including Streptococcus, 
Lactobacillus and Bacillus species as well as Clostridium acetobutylicum, (Oultram 
and Klaenhammer, FEMS Microbiological Letters, 27:129-134 (1985); Gibson er 
a/., (1979); LeBlanc et a/. f Proceedings of the National Academy of Science USA, 
75:3484-3487 (1978)) indicating the potential broad host range utility. The pTREPl 

15 plasmid represents a constitutive transcription vector. 

The pTREX vector was constructed as follows. An artificial DNA fragment 
containing a putative RNA stabilising sequence, a translation initiation region (TIR) t 
a multiple cloning site for insertion of the target genes and a transcription terminator 

20 was created by annealing 2 complementary oligonucleotides and extending with Tfl 
DNA polymerase. The sense and anti-sense oligonucleotides contained the 
recognition sites for Nhel and BamHI at their 5* ends respectively to facilitate 
cloning. This fragment was cloned between the Xbal and BamHI sites in 
pUC19NT7 t a derivative of pUC19 which contains the T7 expression cassette from 

25 pLETl (Wells et al , 7. Appl Bacterial. , 74:629-636 (1993)) cloned between the 
EcoRI and Hindin sites. The resulting construct was designated pUCLEX. The 
complete expression cassette of pUCLEX was then removed by cutting with Hindin 
and blunting followed by cutting with EcoRI before cloning into EcoRI and Sad 
(blunted) sites of pIL253 to generate the vector pTREX (Wells and Schofield, In 
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Current advances in metabolism, genetics and applications-NATO ASI Series, H 
98:37-62 (1996)). The putative RNA stabilising sequence and TIR are derived from 
the Escherichia coli T7 bacteriophage sequence and modified at one nucleotide 
position to enhance the complementarity of the Shine Dalgarno (SD) motif to the 
ribosomal 16s RNA of Lactococcus laais (Schofield et al pers. corns. University of 
Cambridge Dept. Pathology.). 

A Lactococcus lactis MG1363 chromosomal DNA fragment exhibiting promoter 
activity which was subsequently designated P7 was cloned between the EcoRI and 
Bglll sites present in the expression cassette, creating pTREX7. This active 
promoter region had been previously isolated using the promoter probe vector 
pSB292 (Waterfield et al, Gene, 165:9-15 (1995)). The promoter fragment was 
amplified by PCR using the Vent DNA polymerase according to the manufacturer. 

The pTREPl vector was then constructed as follows. An artificial DNA fragment 
which included a transcription terminator, the forward pUC sequencing primer, a 
promoter multiple -cloning site region and a universal translation stop sequence was 
created by annealing two overlapping partially complementary synthetic 
oligonucleotides together and extending with sequenase according to manufacturers 
instructions. The sense and anti-sense (pTREPF and pTREPR) oligonucleotides 
contained the recognition sites for EcoRV and BamHI at their 5' ends respectively to 
facilitate cloning into pTREX7. The transcription terminator was that of the Bacillus 
penicillinase gene, which has been shown to be effective in Lactococcus (Jos et al., 
Applied and Environmental Microbiology, 50:540-542 (1985)). This was considered 
necessary as expression of target genes in the pTREX vectors was observed to be 
leaky and is thought to be the result of cryptic promoter activity in the origin region 
(Schofield et aL pers. corns. University of Cambridge Dept. Pathology.). The 
forward pUC primer sequencing was included to enable direct sequencing of cloned 
DNA fragments. The translation stop sequence which encodes a stop codon in 3 
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different frames was included to prevent translational fusions between vector genes 
and cloned DNA fragments. The pTREX7 vector was first digested with EcoRI and 
blunted using the 5' - 3' polymerase activity of T4 DNA polymerase (NEB) 
according to manufacturer's instructions. The EcoRI digested and blunt ended 
pTREX7 vector was then digested with Bgl II thus removing the P7 promoter. The 
artificial DNA fragment derived from the annealed synthetic oligonucleotides was 
then digested with EcoRV and Bam HI and cloned into the EcoRI(blunted)-Bgl II 
digested pTREX7 vector to generate pTREP. A Lactococcus laais MG1363 
chromosomal promoter designated PI was then cloned between the EcoRI and BgUI 
sites present in the pTREP expression cassette forming pTREPL This promoter was 
also isolated using the promoter probe vector pSB292 and characterised by 
Waterfield et al. 9 (1995), supra. The PI promoter fragment was originally 
amplified by PCR using vent DNA polymerase according to manufacturers 
instructions and cloned into the pTREX as an EcoRI-BglH DNA fragment. The 
EcoRI-BglH PI promoter containing fragment was removed from pTREXl by 
restriction enzyme digestion and used for cloning into pTREP (Schofield et al pers. 
corns. University of Cambridge, Dept. Pathology.). 

(b) PCR amplification of the S. aureus nuc gene . 

The nucleotide sequence of the 5. aureus nuc gene (EMBL database accession 
number V01281) was used to design synthetic oligonucleotide primers for PCR 
amplification. The primers were designed to amplify the mature form of the nuc 
gene designated nucA which is generated by proteolytic cleavage of the N-terminal 
19 to 21 amino acids of the secreted propeptide designated Snase B (Shortle, (1983), 
supra). Three sense primers (nucSl, nucS2 and nucS3, Appendix 1) were designed, 
each one having a blunt-ended restriction endonuclease cleavage site for EcoRV or 
Smal in a different reading frame with respect to the nuc gene. Additionally Bglll 
and BamHI were incorporated at the 5 1 ends of the sense and anti-sense primers 
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respectively to facilitate cloning into BamHI and Bglll cut pTREPl. The sequences 
of all the primers are given in Appendix 1. Three nuc gene DNA fragments 
encoding the mature form of the nuclease gene (NucA) were amplified by PCR using 
each of the sense primers combined with the anti-sense primer described above. The 
nuc gene fragments were amplified by PCR using S. aureus genomic DNA template, 
Vent DNA Polymerase (NEB) and the conditions recommended by the 

manufacturer. An initial denaturation step at 93 °C for 2 min was followed by 30 

cycles of denaturation at 93 °C for 45 sec, annealing at 50 °C for 45 seconds, and 

extension at 73 °C for 1 minute and then a final 5 min extension step at 73 °C. The 
PCR amplified products were purified using a Wizard clean up column (Promega) to 
remove unincorporated nucleotides and primers. 

(c) Construction of the pTREPl-nuc vectors 

The purified nuc gene fragments described in section b were digested with Bgl II and 
BamHI using standard conditions and ligated to BamHI and Bglll cut and 
dephosphorylated pTREPl to generate the pTREPl-nucl, pTREPl-nuc2 and 
pTREPl-nuc3 series of reporter vectors. General molecular biology techniques were 
carried out using the reagents and buffer supplied by the manufacture or using 
standard conditions(Sambrook and Maniatis, (1989), supra). In each of the pTREPl- 
nuc vectors the expression cassette comprises a transcription terminator, lactococcal 
promoter PI, unique cloning sites (Bglll, EcoRV or Smal) followed by the mature 
form of the nuc gene and a second transcription terminator. Note that the sequences 
required for translation and secretion of the nuc gene were deliberately excluded in 
this construction. Such elements can only be provided by appropriately digested 
foreign DNA fragments (representing the target bacterium) which can be cloned into 
the unique restriction sites present immediately upstream of the nuc gene. 
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In possessing a promoter, the pTREPl-nuc vectors differ from the pFUN vector 
described by Poquet et al (1998), supra, which was used to identify L. lactis 
exported proteins by screening direcdy for Nuc activity directly in L. lactis. As the 
pFUN vector does not contain a promoter upstream of the nuc open reading frame 
the cloned genomic DNA fragment must also provide the signals for transcription in 
addition to those elements required for translation initiation and secretion of Nuc. 
This limitation may prevent the isolation of genes that are distant from a promoter 
for example genes which are within polycistronic operons. Additionally there can be 
no guarantee that promoters derived from other species of bacteria will be 
recognised and functional in L. lactis. Certain promoters may be under stringent 
regulation in the natural host but not in L. lactis. In contrast, the presence of the PI 
promoter in the pTREPl-nuc series of vectors ensures that promoterless DNA 
fragments (or DNA fragments containing promoter sequences not active in L. lactis) 
will still be transcribed. 

(d) Screening for secreted proteins in 5. pneumoniae 

Genomic DNA isolated from S. pneumoniae was digested with the restriction 
enzyme Tru9I. This enzyme which recognises the sequence 5'- TTAA -3' was used 
because it cuts A/T rich genomes efficiently and can generate random genomic 
DNA fragments within the preferred size range (usually averaging 0.5 - 1.0 kb). 
This size range was preferred because there is an increased probability that the PI 
promoter can be utilised to transcribe a novel gene sequence. However, the PI 
promoter may not be necessary in all cases as it is possible that many Streptococcal 
promoters are recognised in L. lactis. DNA fragments of different size ranges were 
purified from partial Tru9I digests of S. pneumoniae genomic DNA. As the Tru 91 
restriction enzyme generates staggered ends the DNA fragments had to be made 
blunt ended before ligation to the EcoRV or Smal cut pTREPl-nuc vectors. This 
was achieved by the partial fill-in enzyme reaction using the 5' -3* polymerase • 
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activity of Klenow enzyme. Briefly Tru9I digested DNA was dissolved in a solution 
(usually between 10-20 /d in total) supplemented with T4 DNA ligase buffer (New 
England Biolabs; NEB) (IX) and 33 /zM of each of the required dNTPs, in this case 
dATP and dTTP. Klenow enzyme was added (1 unit Klenow enzyme (NEB) per Mg 
of DNA) and the reaction incubated at 25°C for 15 minutes. The reaction was 
stopped by incubating the mix at 75°C for 20 minutes. EcoRV or Smal digested 
pTREP-nuc plasmid DNA was then added (usually between 200-400 ng). The mix 
was then supplemented with 400 units of T4 DNA ligase (NEB) and T4 DNA ligase 
buffer (IX) and incubated overnight at 16°C. The ligation mix was precipitated 
directly in 100% Ethanol and 1/10 volume of 3M sodium acetate (pH 5.2) and used 
to transform L. lactis MG1363 (Gasson, 1983). Alternatively, the gene cloning site 
of the pTREP-nuc vectors also contains a Bglll site which can be used to clone for 
example Sau3AI digested genomic DNA fragments. 

L. laais transformant colonies were grown on brain heart infiision agar and nuclease 
secreting (Nuc+ ) clones were detected by a toluidine blue-DNA-agar overlay (0.05 
M Tris pH 9.0, 10 g of agar per litre, 10 g of NaCl per liter, 0. 1 mM CaC12, 0.03% 
wt/vol. salmon sperm DNA and 90 mg of Toluidine blue O dye) essentially as 
described by Shortle, 1983, supra and Le Loir et a/., 1994, supra). The plates were 
then incubated at 37°C for up to 2 hours. Nuclease secreting clones develop an 
easily identifiable pink halo. Plasmid DNA was isolated from Nuc+ recombinant!. 
laais clones and DNA inserts were sequenced on one strand using the NucSeq 
sequencing primer described in Appendix 1, which sequences directly through the 
DNA insert. 

Isolation of Genes Encoding Exported Proteins from 
S. pneumoniae 

A large number of gene sequences putatively encoding exported proteins in S. 
pneumoniae have been identified using the nuclease screening system. These have 
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now been further analysed to remove artefacts. The sequences identified using the 
screening system have been analysed using a number of parameters. 

1. All putative surface proteins were analysed for leader/signal peptide 
sequences using the software programs Sequencher (Gene Codes Corporation) and 
DNA Strider (Marck, Nucleic Acids Res., 16:1829-1836 (1988)). Bacterial signal 
peptide sequences share a common design. They are characterised by a short 
positively charged N-terminus (N region) immediately preceding a stretch of 
hydrophobic residues (central portion-h region) followed by a more polar C-terminal 
portion which contains the cleavage site (c-region). Computer software is available 
which allows hydropathy profiling of putative proteins and which can readily 
identify the very distinctive hydrophobic portion (h-region) typical of leader peptide 
sequences. In addition, the sequences were checked for the presence of or absence of 
a potential ribosomal binding site (Shine-Dalgarno motif) required for translation 
initiation of the putative nuc reporter fusion protein. 

2. All putative surface protein sequences were also matched with all of the 
protein/DNA sequences using the publicly databases [OWL-proteins inclusive of 
SwissProt and GenBank translations]. This allows us to identify sequences similar to 
known genes or homologues of genes for which some function has been ascribed. 
Hence it has been possible to predict a function for some of the genes identified 
using the LEEP system and to unequivocally establish that the system can be used to 
identify and isolate gene sequences of surface associated proteins. We should also be 
able to confirm that these proteins are indeed surface related and not artifacts. The 
LEEP system has been used to identify novel gene targets for vaccine and therapy. 

3. Some of the genes identified proteins did not possess a typical leader 
peptide sequence and did not show homology with any DNA/protein sequences in 
the database. Indeed these proteins may indicate the primary advantage of our 
screening method, i.e. the isolation of atypical surface-related proteins, which may 
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have been missed in all previously described screening protocols or approaches 
based on sequence homology searches. 

In all cases, only partial gene sequences were initially obtained. Full length genes 
5 were obtained in all cases by reference to the TIGR S.pnewnoniae database 

(www@tigr.org). Thus, by matching the originally obtained partial sequences with 
the database, we were able to identify the full length gene sequences. In this way, as 
described herein, three groups of genes were clearly identified, ie a group of genes 
encoding previously unidentified 5. pneumoniae proteins, a second group exhibiting 
10 some homology with known proteins from a variety of sources and a third group 

which encoded known S.pnewnoniae proteins, which were, however, not known as 
antigens. 



Example 2: Vaccine trials 

15 

pcDNA3.1+ as a DNA vaccine vector 
pcDNA3.1 + 

20 The vector chosen for use as a DNA vaccine vector was pcDNA3. 1 (Invitrogen) 
(actually pcDNA3.1 + , the forward orientation was used in all cases but may be 
referred to as pcDNA3.1 here on). This vector has been widely and successfully 
employed as a host vector to test vaccine candidate genes to give protection against 
pathogens in the literature (Zhang, et al , Kurar and Splitter, Anderson et a/.). The 

25 vector was designed for high-level stable and non-replicative transient expression in 
mammalian cells. pcDNA3.1 contains the ColEl origin of replication which allows 
convenient high-copy number replication and growth in £. coli. This in turn allows 
rapid and efficient cloning and testing of many genes. The pcDNA3. 1 vector has a 
large number of cloning sites and also contains the gene encoding ampicillin 

30 resistance to aid in cloning selection and the human cytomegalovirus (CMV) 

immediate-early promoter/enhancer which permits efficient, high-level expression of 
the recombinant protein. The CMV promoter is a strong viral promoter in a wide 
range of cell types including both muscle and immune (antigen presenting) cells. 
This is important for optimal immune response as it remains unknown as to which 

35 cells types are most important in generating a protective response in vivo. A T7 
promoter upstream of the multiple cloning site affords efficient expression of the 
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modified insert of interest and which allows in vitro transcription of a cloned gene in 
the sense orientation. 

Zhang, D., Yang, X., Berry, J. Shen, C, McClarty, G. and Brunham, R.C. (1997) 
"DNA vaccination with the major outer-membrane protein genes induces acquired 
immunity to Chlamydia trachomatis (mouse pneumonitis) infection". Infection and 
Immunity, 176, 1035-40. 

Kurar, E. and Splitter, G.A. (1997) "Nucleic acid vaccination of Brucella abortus 
ribosomal L7/L12 gene elicits immune response". Vaccine, 15, 1851-57. 

Anderson, R., Gao, X.-M., Papakonstantinopoulou, A., Roberts, M. and Dougan, 
G. (1996) "Immune response in mice following immunisation with DNA encoding 
fragment C of tetanus toxin". Infection and Immunity, 64, 3168-3173. 

Preparation of DNA vaccines 

Oligonucleotide primers were designed for each individual gene of interest derived 
using the LEEP system. Each gene was examined thoroughly, and where possible, 
primers were designed such that they targeted that portion of the gene thought to 
encode only the mature portion of the gene protein. It was hoped that expressing 
those sequences that encode only the mature portion of a target gene protein, would 
facilitate its correct folding when expressed in mammalian cells. For example, in the 
majority of cases primers were designed such that putative N-terminal signal peptide 
sequences would not be included in the final amplification product to be cloned into 
the pcDNA3. 1 expression vector. The signal peptide directs the polypeptide 
precursor to the cell membrane via the protein export pathway where it is normally 
cleaved off by signal peptidase I (or signal peptidase II if a lipoprotein). Hence the 
signal peptide does not make up any part of the mature protein whether it be 
displayed on the surface of the bacteria surface or secreted. Where an N-terminal 
leader peptide sequence was not immediately obvious, primers were designed to 
target the whole of the gene sequence for cloning and ultimately, expression in 
pcDNA3.1. 

Having said that, however, other additional features of proteins may also affect the 
expression and presentation of a soluble protein. DNA sequences encoding such 
features in the genes encoding the proteins of interest were excluded during the 
design of oligonucleotides. These features included: 

1. LPXTG cell wall anchoring motifs. 

2. LXXC ipoprotein attachment sites. 

3. Hydrophobic C-terminal domain. 
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4. Where no N-terminal signal peptide or LXXC was present the start codon was 
excluded. 

5. Where no hydrophobic C-terminal domain or LPXTG motif was present the stop 
codon was removed. 

5 

Appropriate PCR primers were designed for each gene of interest and any and all of 
the regions encoding the above features was removed from the gene when designing 
these primers. The primers were designed with the appropriate enzyme restriction 
site followed by a conserved Kozak nucleotide sequence (in most cases(NB except in 

10 occasional instances for example ID59) GCCACC was used. The Kozak sequence 
facilitates the recognition of initiator sequences by eukaryotic ribosomes) and an 
ATG start codon upstream of the insert of the gene of interest. For example the 
forward primer using a BamHl site the primer would begin 
GCGGGATCCGCCACCATG followed by a small section of the 5' end of the gene 

15 of interest. The reverse primer was designed to be compatible with the forward 
primer and with a Notl restriction site at the 5 1 end in most cases (this site is 
TTGCGGCCGC) (NB except in occasional instances for example ID59 where a 
Xhol site was used instead of Notl). 

20 PCR primers 

The following PCR primers were designed and used to amplify the truncated genes 
of interest. 

25 ID5 

Forward Primer 5' 

CGGATCCGCCACCATGGGTCTAATTGAAGACTTAAAAAATCAA 3' 
Reverse Primer 5' TTGCGGCCGCCAATGCTAGACTAAACACAAGACTCA 3' 

30 

ID59 

Forward Primer 5 ' CGCGGATCC ATG AAAAAAATCTATTCATTTTTAGC A 3 ' 
Reverse Primer 5' CCCTCGAGGGCTACTTCCGATACATTTTAAACTGTAGG 
35 3' 



ID51 

40 Forward Primer 5* CGGATCCGCCACCATGAGTCATGTCGCTGCAAATG 3' 
Reverse Primer 5' TTGCGGCCGCATACCAAACGCTGACATCTACG 3' 
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ID29 

Forward Primer 5* CGGATCCGCCACCATGCAAAAAGAGCGGTATGGTTATG 
3' 

5 Reverse Primer 5' TTGCGGCCGCACCCCCATTCTTAATCCCTT 3' 
ID50 

Forward Primer 5 1 

10 CGGATCCGCC ACC ATGG AGGTATGTGA A ATGTCACGTAAA 3 1 

Reverse Primer 5' TTGCGGCCGCTTTTACAAAGTCAAGCAAAGCC 3' 

Cloning 

15 The insert along with the flanking features described above was amplified using PCR 
against a template of genomic DNA isolated from type 4 S. pneumoniae strain 1 1886 
obtained from the National Collection of Type Cultures. The PCR product was cut 
with the appropriate restriction enzymes and cloned in to the multiple cloning site of 
pcDNA3.1 using conventional molecular biological techniques. Suitably mapped 

20 clones of the genes of interested were cultured and the plasmids isolated on a large 
scale (> 1.5 mg) using Plasmid Mega Kits (Qiagen). Successful cloning and 
maintenance of genes was confirmed by restriction mapping and sequencing - 700 
base pairs through the 5' cloning junction of each large scale preparation of each 
construct. 

25 

Strain validation 

A strain of type 4 was used in cloning and challenge methods which is the strain 
from which the 5. pneumoniae genome was sequenced. A freeze dried ampoule of a 

30 homogeneous laboratory strain of type 4 S. pneumoniae strain NCTC 11886 was 

obtained from the National Collection of Type Strains. The ampoule was opened and 
the cultured re suspended with 0.5 ml of tryptic soy broth (0.5% glucose, 5% 
blood). The suspension was subcultured into 10 ml tryptic soy broth (0.5% glucose, 
5% blood) and incubated statically overnight at 37°C. This culture was streaked on 

35 to 5% blood agar plates to check for contaminants and confirm viability and on to 
blood agar slopes and the rest of the culture was used to make 20% glycerol stocks. 
The slopes were sent to the Public Health Laboratory Service where the type 4 
serotype was confirmed. 

40 A glycerol stock of NCTC 1 1886 was streaked on a 5% blood agar plate and 

incubated overnight in a C02 gas jar at 37°C. Fresh streaks were made and optochin 
sensitivity was confirmed. 
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Pneumococcal challenge 

A standard inoculum of type 4 5. pneumoniae was prepared and frozen down by 
5 passaging a culture of pneumococcus lx through mice, harvesting from the blood of 
infected animals, and grown up to a predetermined viable count of around 10 9 
cfii/ml in broth before freezing down. The preparation is set out below as per the 
flow chart. 

10 Streak pneumococcal culture and confirm identity 

i 

V 

15 Grow over-night culture from 4-5 colonies on plate above 



20 Animal passage pneumococcal culture 

(i.p. injection of cardiac bleed to harvest) 

V 

25 

Grow over-night culture from animal passaged pneumococcus 

I 

V 

30 

Grow day culture (to predetermined optical density) from over-night of animal 
passage and freeze down at -70°C - This is standard minimum 

I 

35 V 

Thaw one aliquot of standard inoculum to viable count 

I 

40 V 

Use standard inoculum to determine effective dose (called Virulence Testing) 
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All subsequent challenges - use standard inoculum to effective dose 

An aliquot of standard inoculum was diluted 500x in PBS and used to inoculate the 
mice. 

Mice were lightly anaesthetised using halothane and then a dose of 1.4 x 10 5 cfti of 
pneumococcus was applied to the nose of each mouse. The uptake was facilitated by 
the normal breathing of the mouse, which was left to recover on its back. 

5. pneumoniae Vaccine trials 

Vaccine trials in mice were carried out by the administration of DNA to 6 week old 
CBA/ca mice (Harlan, UK). Mice to be vaccinated were divided into groups of six 
and each group was immunised with recombinant pcDNA3.1+ plasmid DNA 
containing a specific target-gene sequence of interest. A total of 100 fig of DNA in 
Dulbecco's PBS (Sigma) was injected intramuscularly into the tibialis anterior 
muscle of both legs (50 yl in each leg). A boost was carried using the same 
procedure 4 weeks later. For comparison, control groups were included in all 
vaccine trials. These control groups were either unvaccinated animals or those 
administered with non-recombinant pcDNA3.1 + DNA (sham vaccinated) only, 
using the same time course described above. 3 weeks after the second immunisation, 
all mice groups were challenged intra-nasally with a lethal dose of 5. pneumoniae 
serotype 4 (strain NCTC 11886). The number of bacteria administered was 
monitored by plating serial dilutions of the inoculum on 5% blood agar plates. A 
problem with intranasal immunisations is that in some mice the inoculum bubbles out 
of the nostrils, this has been noted in results table and taken account of in 
calculations. A less obvious problem is that a certain amount of the inoculum for 
each mouse may be swallowed. It is assumed that this amount will be the same for 
each mouse and will average out over the course of inoculations. However, the 
sample sizes that have been used are small and this problem may have significant 
effects in some experiments. All mice remaining after the challenge were killed 3 or 
4 days after infection. During the infection process, challenged mice were monitored 
for the development of symptoms associated with the onset of 5. pneumoniae 
induced-disease. Typical symptoms in an appropriate order included piloerection, 
an increasingly hunched posture, discharge from eyes, increased lethargy and 
reluctance to move. The latter symptoms usually coincided with the development of 
a moribund state at which stage the mice were culled to prevent further suffering. 
These mice were deemed to be very close to death, and the time of culling was used 
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to determine a survival time for statistical analysis. Where mice were found dead, 
the survival time was taken as the last time point when the mouse was monitored 
alive. 

Interpretation of Results 

A positive result was taken as any DNA sequence that was cloned and used in 
challenge experiments as described above which gave protection against that 
challenge. Protection was taken as those DNA sequences that gave statistically 
significant protection (to a 95% confidence level (p<0.05)) and also those which 
were marginal or close to significant using Mann- Whitney or which show some 
protective features for example there were one or more outlying mice or because the 
time to the first death was prolonged. It is acceptable to allow marginal or non- 
significant results to be considered as potential positives when it is considered that 
the clarity of some of the results may be clouded by the problems associated with the 
administration of intranasal infections. 
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p value 2 refers to significance tests compared to pcDNA3.1 + vaccinated controls 
Statistical Analyses. 

. Trial 1 - None of the other groups had significandy longer survival times than the 
controls. The survival times of the unvaccinated and pcDNA3. 1 control groups were 
not significandy different. One of the mice from ID5 was an outlying result and the 
mean survival times for IDS were extended but not significantly so. 
Trial 2 - The group vaccinated with ID59 had significantly longer survival times than 
the unvaccinated control group. 

Trial 5 - The group vaccinated with ID59 again survived for an average of almost 10 
hours longer than the controls but the results were not quite statistically significant. 
Trial 6 - The group vaccinated with ID51 did not have survival times significandy 
higher than unvaccinated controls (p= <36.0), however, there were 2 outlying mice 
in the vaccinated group. 

Vaccine trials 7 and 8 (See figure 2) 



Mouse 
number 

1 


Mean survival times (hours) 


Unvacc 
control (7) 


ID29(7) 


Unvacc 
control (8) 


ID50 (8) 


59.6 


73.1 


45.1 


60.6 




47.2 


54.8 


50.8 


60.6 


3 


59.6 


59.3 


60.4 


51.1 


4 


70.9 


54.8* 


55.2 


60.6 


5 


68.6* 


59.3 


45.1 


60.6 


6 


76.0 


54.8 


45.1 


60.6 


Mean 


63.6 


59.35 


50.2 


59.1 


sd 


10.3 


7.1 


6.4 


3.9 


p value 1 




<39.0 




0.0048 



* - bubbled when dosed so may not have received full inoculum. 

T - terminated at end of experiment having no symptoms of infection. 
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Numbers in brackets - survival times disregarded assuming incomplete dosing 
p value 1 refers to significance tests compared to unvaccinated controls 

Statistical Analyses. 

Trial 7 - The ID29 vaccinated group showed prolonged times to the first death. T 
Trial 8 - The group vaccinated with ID50 survived significantly longer than 
unvaccinated controls. 
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Appendix I • Oligonucleotide primers 
nucSl 

Bglll EcoRV 
5 5'- cgagatctgatatctcacaaacagataacggcgtaaatag -3' 

nucS2 

Bgl II Smal 
5'- gaagatcttccccgggatcacaaacagataacggcgtaaatag -3' 

10 

nucS3 

Bgl II EcoRV 
5'- c gagatctgatatc catcacaaacagataacggcgtaaarag -3" 

15 nucR 

Bam HI 

5 ' - cgggatccttatggacctgaatcagcgttgtc -3 ' 
NucSeq 

20 5'- ggatgctttgtttcaggtgtatc -3' 
pTREPF 

5 catgatatcggucctcaagctcatatcaagtccggcaatggtgtgggctmtttgttttagcggataa 
caatttcacac -3' 

25 

pTREPR 

5 ' - gcggatcaxcgggcn^ttaatgtttaaacactagtcgaagatctcgcgaauctcctgtgtgaaatt 
gttatccgcta -3* 

30 pUCF 

5'- cgccagggttttcccagtcacgac -3' 

VR 

5'- tcaggggggcggagcctatg -3" 

35 

Vl 

5'- tcgtatgttgtgtggaattgtg -3' 
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V2 

5'- tccggctcgtatgttgtgtggaattg -3' 
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20 



25 



55 



60 



TABLE 1 



ITU 1200 bo 



£^ G ;^ 

mCGCCGTTCCTOTmAGGAATCTCTCrrACGAATrGGG^ 

AGTGGCACTAGTGACAACAGTGCCATCTGTAGCAGAAGGACTGAAGAATCTAAAT^^^ 
in ^^^GCAAGTGCCAAAGAAGCAATTAAAGAAGAAAAATTAAAAGCnTA^ 
10 GTGTTCTAAAGGCAGTTTATCATGGCGAAACATCGCTTGAAAATGGAATTAAATTTGA^^ 

ic A ^J^AAGTCGTT^ 

15 ATmAAC^ATATTGGGATCTATGTTGTAGGTGGTCTGGCTGCC 

TCAGTCTGGTATTrTGGATCACTTGGGAGATGCTATCTCACTGAATACC^ 

GA ^ A iiTr G ;^^ 

CATGGATTTCACTTGCTATTACAGTGATTTTTGCGGTGGTAGCAACAGGATTTATCGGACGCATGTATC 
CGTTCTTCAAACGGATGATTTAGGGATTTGGAAAACCTTTAAACGTGCCTTATC^ 

MRNMWVVKETYLRHVESWSFFIWISPFLFLG^ 
EASAXEAKEEKUCGYLTtlXJEDSVUCAVra 

DEAXE>0aCHQmAGALGFFLYMILrrYAGVTAQEVA5EKGTKIMEWFSSIRA^ 
GL\AVLI^DLPFLAQSGILDHlX»DAJSLrnxm^ 

AAGDNLlXKIGSYIPnSTFFMPFRTI>n5YAGGA£AWlSLAn'VIFAVVATGFIGRMYASLVLQTDDLGIW 
30 IDS 1125 bn 

CCTGGGAAAGTCTTGAAAATTATGATAGAATGGTGGAAGGAAAAATTCA 

GAAAGTCTTCTCGTATCCATTGTAATCAGTGCATACAATGAAGAAAAATATCTGCCTGGTCTAArr^ 
AAAATCAAACCTATCCTAAAGAGGATATTGAAATTCTATTTATAAATGCTATGTCCACAGATGGGACCACAGCT 
TCATTCAGCA^TrrATAAAGGAAGATACAGAGTTTAACTCAATTAGATrGTATAJVCAATCCTAA 
CTAGTGGTTTTAACCTGGGAGTTAAACATTCTGTAGGGGACCTTATTTTAAAA^ 
TGAGACTTTTGTAATGAACAATGTGGCTATTATrCAACAAGGTGAATTTGTCTGTG 
GTCGAAGGAAAAGGAAAATGGGCAGAGACOTGCATCTTGTTGAGGAAAATATGTTrGGCAGTAGCAT^ 
TATCGAAATAGTTCTGAGGATAGATATGTTTCTTCTATTTITCATGGAATGTATAAAC 
W TTGGTTTAGTAAj\TGAGCA>ACTTGGCCGAACTGAAGATAATGATATTCATTATAGAATTCGAGA^ 
A^TCCGCTATAGCCCAAGTATTCTATOTATCAGTAT^ 

TCAAATGGTTTGTGGATTGGCTTGACAAGT(^TGTTCAGTTTAAGTGTTTATCA 

ATTTGTTTTGAGTCTTGTGTTTAGTCTAGCATTGTTACCGATCA 
ac ATTTrCTACTTTTGTCATTACTCACTTTGCTGACTTTATTA^ 
* J ATmATTn*CCATTCACTTTGCTTATGGCCTTGGGACGATTGTAGG^ 

AGTACAAGAGAACAATAATTTATTTXjGATAAAATAAGCCAAATAAATCAAAATATGCTAT 

WKVUUMIEWWKEKHW^ 

EDTEFNSIRLYNNPKKNQASGFNLGVKHSVGD^^ 

JU L HLVEENMFGSSIANYRNSSEDRYVSSIFHGMYKI^VFQKVGLVNEQLGRTEDNDIHYR1REYGYKIRYSPSILSY0YIRP 
TFKKMLHQKYSNGLWIGLTSHVQFKO^Utt^^ 

UVMPF1LFSIHFAYGLGTIVGURGFKWKXEYKRTUYLDKISQINQNML2 



IDU 696 bp 

ATGATGAAAGAACAAAATACGATAGAAATCGATGTAmCAj\TTAGTT 

ATTTTAATAGTGGCACTTGTGACAGGTGCGGGGGCTTTTGCATATAGCACT^ 

CTACCACGCGAATITACGTAGTGAATCGCAATCAAGGAGACAAGCCGGGGTTC 

gaacttatctggtaaaaga ctacc gtgagattatccttt 

actagatttgacgccaaaaggtttggctaataaaattaaagtgacagtaccagttgatacccgtattgtctct 

TCAGTTAATGATCGACnTCCTGAAGACKiCAAGCCGTATCGCTAACTCTTrGAGA 

ATG\GTATTACTCGTGTTTCTGA CGTGAC AACACTGGAGGAGGCAAGGCCGGCGATATCCCCGTCTTCGCCAAAT 
ATTAAACGCAATACACTAATTGGTTTTrTGGCAGGGGTGATTGGAACT^ 
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GGATACTCGTGTGAAACGTCCGGAAGATATCGAAAATAGATTGCAGATGACACTTTTGGG 
GGGTAAGTTGAAATAG 

MMKEQhnTEIDVFQLVKSLWKRKLMIUVALVTGAGAFAYSTFWKPE 

VKDYR£IU^QDVI^VVSDLKU)LTPKGIj\NKK 

TliEAJtfAISPSSPNIKROTUGFI^GVrcrTSW 

ATGGTAAAAGTAGCAGTTATATTAGCTCAGGGCTTTGAAGAAATTGAAGCCTTGACAGTTGTA 

GAGCCA ATATC ACATGTGATATGGTTGGTTTTGAAGAGCAAGTAACGGGTTCGCATGCAATCCAAGTAAGAGCAG 

ATCATGTCTTTGATGGAGATTTATCAGACTATGATATGATTGTTCTTCCTGGAGGTATGCCTGGTTCTGC^ 

CGTGATAATCAGACCTTGATTCAAGAATTGCAAAGmCGAGCAAGAAG 

GCACCAATTGCCCTCAATCAAGCAGAGATATTGAAAAATAAGCGATACACTTGTTATGACGGC^ 

ATCCTTGATGCTCACTACGTCAAGGAAACAGTAGTGGTAGATGGTCAGTTGACA^ 

GCCCTTGC CTTTG CCTACGAGTTGGTGGAGCAACTAGGAGGGGACGCAGAGAGTTTACGAACAGGAATGCTCTAT 
CGAGATGTCTTTGGTAAAAATCAGTAA 

MVKVAVILAQGFEEIEALTVVDVLRRANITCDMVGFEEQV^ 

DNQTLIQELQSFEQEGKK1j\AICAAPIALNOAE1LKNKRYTCYDGVQEQILDGHYVKETVWDGQLTORGPSTALAFA 
YELVEQLGGDAESLRTGMLYRDVFGKNQZ 

GTGGTAGGGATGGTAGAACCAAACCTAGAAAGCCTrATAAAAGATCTTTACAATCATGCTCGACAT^ 

G AAG ATTT AGTTGCTG CTCTCCT AG AG ACT ACT AAAAAACTG CCT ACT AC AAATG AGCAATTGC AGG C AGTTCGTC 

TCTCAGGCCTGGTCAATCGTGAATTGCTCCTAAATCCCAAACATCGAGCACCTGAGTTGCTCAAC^ 

TGTCAAAAGAGAAGAAGCCAAGTACAGAGGAACTGCGACrrcnKrGCTTATGTATGAGGAACTCm 

TTGA 

MVGMVEPNLESUKDLYNHARHDLSEDLVAALLETTKKLPTTNEQLQAVR^ 
REEAKYRGTATSALMYEELFKMLZ 

ID29945bB 

TTGTTCTrAAAAAA GGAA AGAGAGGTAATCAGCATGCGTAAATGGACAAAAGGATTTCTCATCTT^ 
ACTACCGTTATCGGCTTTATCCTGCI I I " I"l GTAGGTATCCAATCTGACGGGAtT AAG AGCCTACTTTCCATGTCCAA 
AGAACCTGTCTATGATAGCCGTACGGAAAAGCTAACCTTTGGCAAGGAAGTCGAAAACCTAGAAATTACT 
CCAACACACGCTCACCATCACAGACTCTTTCG ATG ATCAAATCCACA 1 ' I ' l C l ' l ACCATCCATCTCTTTCTGCTCAC 
CATGATCTTATCACCAATCAGAACGATAGAACTCTGAGTCTCACTGATAAGAAACTGTCTGAAACTCCGTTTCTCT 
CTTCTGGAATTGCTrGCKjATTCTTCATATCGCA^ 

AAAAGGGAGAACTCTAAAAGGGATCAACATCTCAGCCAATCGCGGACAAACCACCATCATAAATGCTAGCCTTGA 
AAATGCGACCCTCAATACAAACAGCTATATCCTCCGAATTGAAGGAAGTCGTATCAAAAACAGTAAACTCACAAC 
GCCCAATATCGTTAATATCrTTGATACAGTTCTTACAGATAGTCAGCTAGAGTCAACAGAGAATCACTrCCACGCT 
GAAAATATCCAAGTCCATGGCAAGGTTGAACTGACTGCCAAAGATTATCTCAGAATCATCCTAGACCAGAAAGAA 
AGCCAACGAATTAACTGGGACATCTCAAGCAACTATGGTTCTATCTTCCAATTCACAAGAGAAAAGCCTGAATCA 
AGAGGTACGGAATTAAGCAACCCTTACAAAACTGAAAAAACCGATGTCAAGGATCAACTCATTGCGAGATCTGAT 
GATAATATTGATCTAATATCCACACCAAGCAGACGTTGA 

MFUCKEREVISMRKWTKGFUFGVVTTVIGH^^ 

TDSFDIX3IHISYHPSLSAHHDLn>fQNDRTlSLTDKKL£ETPFl^SGlGGlLH^ 

GCrmiNASLENATl^NSmRIEGSRIKNSKLTTPNIVNIFDTVLTDSQLESTENHFHAENIQVHGKVELTAKDYLRIILD 
QKESQRJNWDISSNYGSIFQFTREKPESRGTELSNPYKTEKTDVKDQUAR5DDNIDLISTPSRRZ 

ATGAAACAAGAATGGTITGAAACTAATGATTTTCTAAAAACAACAAGCAA 

AGAGGTTGCAGACAAGGCTGAAGAAACGATAGCCGATCTCGATACACCAATTGAAAAAAATACTCAGT^ 

AGGAAGTCCCTCAAGCTGAAGTCGAATTGGAAAGCCAGCAAGAAGAGAAAATTGAAGCTCCTGAAGACAGTGAA 

GCGAGAACAGAAATAGAAGAAAAGAAGGCATCTAATTCTACTGAAGAAGAGCCAGACCTTTCTAAAGAAACAGA 

AAAACTCACTATAGCTGAAGAGAGCCAAGAAGCTCTTCCTCAGCAAAAAGCAACCACGAAAGAGCCAC^ 

CA GTAAAT CTTTAGAAAGTCCTTATATCCCCGACCAAGCTCCAAAATCTAGGGATAAATGGAAAGAGCAACT 

TGATTTTTGCTCTTGGCTAGTGGAAGCGATCAAATCT 

ACAGCCITTCTCTrGCTCATTCTGTTTTCTG CATCTTCC I " I ' 1 ' I I CrrrAGTATCTATCACATCAAACATGCTTACTAT 
GGACATATAGCAAGCATTAACAGTCGCTTCCCTGAGCAGCTAGCTCCTTTAACT Cl ' 1 1 I I 1 CTATCATCTCTATCCT 
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AGTAGCG ACAACACTC 1 1 CI 1 C" 1 1" I" rCATTCCTCTTGGGTAGTTTCGTTGTGAG ACGATTTATCCACCAGG AAAAG 
GACTGGACGCTAGACAAGGTTCTCCAACAATATAGTCAACTCTTGGCAATTCCAATCTCCT 
TTTCTTTGCTTTCTTTGATAGCCTACGATTTACAGCCCT i CCTCACTGCTATTGCTAG 

MKQEWFESNDFVKTTSK^PEEQAQEVAOKAEETIAOLDTPIEKNTQLEEE^ 

EKKASNSTEEEPDLSKETEKVTIAEESQEAU>QQKATTKEPUJ^ 

PTSKLETSrrHSYTAHXULFSASSFFFSIYHIKHAra 

QEKDWTLDKVLQQYSQ1XAIPISSLLLLVSLLSLIAYDLQPSCVZ rrario-uor vruu-in 

GCATGCGTAAATGGACAAAAGGATITCTCATCTTTGGTGTGGTGACTACCGTTATCGGCTT^ 

GGTATCCAATCTGACGGGATTAAGAGCCTACTTTCCATGTCCAAAGAACCTGTCTATGATAGCCGTACGGAAAAG 

CTAACCTTTGGCAAGGAAGTCGAAAACCTAGAAATTACTCTCCACCAACACACGCTCACCATCACAGACT 

GATGATCAAATCCACAl l i L 1 1 ACCATCCATCTCTTTCTGCTCACCATGATCTTATCACCAATCAG AACGATAGAA 

CTCTGAGTCTCACTGATAAGAAACTtTrCTGAAACTCCGTTTCTCTCTT 

AAGTAGCTACTCTAGTCGTTTTGAAGAAGTTATTCTCCGACTACCA^ 

CTCAGCCAATCGCGGACAAACCACCATCATAAATGCTAGCCTTGAAAATGCGACCCTCAATACAAACAGCTATAT 

CCTCCGAATTGAAGGAAOTCGTATCAAAAACAGTAAACTCACAACGCCCAATATCGTTAATATC^ 

CTTACAGATACTCAGCTAGAGTCAACAGAGAATCACTTC 

CTGACTGCCAAAGATTATCTCAGAATCATCCTAGACCAGAAAGAAAGCCAACGAATTAACTGGGACATCTCAAGC 

AACTATGGTTCTATCTTCCAATTCACAAGAGAAAAGCCTGAATCAAGAGGTACGGAATTAAGCAACCCTTAC 

ACrGAAAAAACCGATGTCAAGGATCAACTCATTGCGAGATCTGATGATAATATTGATCTAATATCCACACCAAGC 
AGACGTTGA 

MQUSSVYSLFVWYNU^KKEREVISMWC\VTKGFLIFGVVTT^ 
KEVEhn-EnXHQHTLTrTDSFDDQIHISYHPSLS^^ 
RIJKGRTUCGINISANRGQTTIINASLENATL^^ 
HGKVELTAKDYLRfllJX^KESQRJhrWDISSNYGSff^ 

n>i<n-78b 0 

ATGATATGTAAAATGAAGCAGGGAGGGAGCAGGGCGTGCTGGGGATGGAGAGTGGGGGAGGGACGCrGCTATTT 
TAATC 

MICKMKQGGSRACWGWRVGEGRCYFN 
ID109 714 bo 

CGATAAAGAGGCCTTGAGTAATCTCAATTTGCAGATTGAAAATGGAGAGA 
GGCTGGAAAATCGACCACTATAAAATCCCTAGTCAGTATCATTTCACCCA 
CAGGACTTTATCGGAAAATCGCTTGGCTATTAAACGAAAGATTGGCTACGTAGCAGACrCXjCCT 
GCTTAACGGCCAATCAATnTCGGAATTGATCGCCTCATCCTATGATCTGAOT 

AGCTAGGC TATT GAACGl i I 1 i GATTTTGCTG AAAATCG CTATCAGGTTATTG AAACTCTTTCTCACGG AATGCGT 
CAGAAAGTCTTTG TCAT CGGAGCACTCrTGTCTGATCCCGATATTTGG 

ATCCCGAGGCTGCCTTTGATTTGAuAACAGATGATGAAGGAACATGCACAAAAAGGGAAGACAGTCl IOU 1 1 CAA 
CTCATCTCCTAGAGGTGGCAGAGCAAGTCTGTGATCGGATTGCCATTTTGAAAAAGGGG^ 
TAAGGTAGAGGACTrGAGGAAAGACCACCCAGACCAGTCTTTCK}AAAGTATCTACCTTAGTCTTGCTGGTA 
AGAGGAGGTTGCGGATGCGTCTCAAGGTCATTAA 

DKEAl^NLNUJIENGEIMGUGHNGAGKSTTlKSLVSflSPSSG 

WEUASSYDURSDLEASI^RLLNVFDFAENRYQVIETLSHGMRQKVFVI^ 

MMKEHAQKGKTVLFSTHVLEVAEQVCDRIAIUCKGHUYCGKVEDUUCDH^ 

ID112 360bo 

ATGGc i i iui i 1 l CAGAG AGAGGAGCAGTACGGAAGACACCAATGGCAAGTCCAATAATGAGACCTATG ATGGTT 
CCGACGATAGAGATTAAAAGAGTGATACCAGCACCACGCAAGAGTTGTTGCCAGTTTTCAGAAAGAATTTTAGCA 
ACTTGGCTAAAGAAACTACTGCTAGTCTCTTG\GTrGTTGTAGCTTCGGCAG 

TCAAG GC A ACTTG GTCATCTTTTG AA ATGGTTTC A ATG CTG GC ATTG ATTTGG CT AATACG ATTGTCATTTTT ACG A 
AGCCCGATAGCGATAGCTGTATCTTCrTCCCCAGTTTTGAAACCAGGTTCTACTTGA 
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MALFSERGAVRKTPMASPIMRPMMVPTIE1KRVIPAPRKSCCQFSEW 
FEMVSMIAUWURLSFLRSPIAIAVSSSPVLKPGST2 

ID 128 - 3.43 

ATGAAATTTAGTAAAAAATATATAGCAGCTGGATCAGCTGTTATCGTATC 

CTrGAGTCTATGTGCCTATGCACTAAACCAGCATCGTTCGCAGGAAAATA 

AGGACAATAATCGTCTCTCTTATGTGGATGGCAGCCAGTCAAGTCAGAAA 

AGTGAAAACTTGACACCAGACCAGGTTAGCCAGAAAGAAGGAATTCAGGC 

TGAGCAAATTGTAATCAAAATTACAGATCAGGGCTATGTAACGTCACACG 

GTCACCACTATCATTACTATAATGGGAAAGTTCCTTATGATGCCCTCnT 

AGTGAAGAACTCTTGATGAAGGATCCAAACTATCAACTTAAAGACGCTGA 

TATTGTCAATGAAGTCAAGGC?TGGTTATATCATCAAGGTCGATGGAAAAT 

A1TATGTCTACCTGAAAGATGCAGCTCATGCTGATAATGTTCGAACTAAA 

GATGAAATCAATCGTCAAAAACAAGAACATGTCAAAGATAATGAGAAGGT 

TAACTCTAAT GTTG CTGTAGCAAGGTCTCAGGGACGATATACGACAAATG 

ATGGTTATGTCTTTAATCCAGCTGATATTATCGAAGATACGGGTAATGCT 

TATATCGTTCCTCATGGAGGTCACTATCACTACATTCCCAAAAGCGATTT 

ATCTGCTAGTGAATTAGCAGCAGCTAAAGCACATCTGGCTGGAAAAAATA 

TGCAACCGAGTCAGTTAAGCTATTCTTCAACAGCTAGTGACAATAACACG 

C AATCTGT AG C AAAA GG ATC AACT AG CAAGC C AG C AAAT A A ATCTG A AAA 

TCTCCAGAGTCTTTTGAAGGAACTCTATGATTCACCTAGCGCCCAACGTT 

ACAGTGAATCAGATGGCCTGGTCTTTGACCCTGCTAAGATTATCAGTCGT 

ACACCAAATG GAGT TGCGATTCCGCATGGCGACCATTACCACTTTATTCC 

TTACAGCAAGCTTTCTGCCTTAGAAGAAAAGATTGCCAGAATGGTGCCTA 

TCAGTGGAACTGGTTCTA CAGT TTCTACAAATGCAAAACCTAATGAAGTA 

GTGTCTAGTCTAGGCAGTCTTTCAAGCAATCCTTCTTCTTTAACGACAAG 

TAAGGAGCTCTCTTCAGCATCTGATGGTTATATTTTTAATCCAAAAGATA 

TCGTTGAAGAAACGGCTACAGCTTATATTGTAAGACATGGTGATCATTTC 

CATTACATTCCAAAATCAAATCAAATTGGGCAACCGACTCTTCCAAACAA 

TAGTCTAGCAACACCTTCTCCATCTCTTCCAATCAATCCAGGAACTTCAC 

ATGAGAAACATGAAGAAGATGGATACGGATTTGATGCTAATCGTATTATC 

GCTGAAGATGAATCAGGTTTTGTCATGAGTCACGGAGACCACAATCATTA 

TTTCTTCAAGAAGGACTTGACAGAAGAGCAAATTAAGGTGCGCAAAAACA 

TTTAG 

MKFSKKYIAAGSAVIVSLSLCAYALNQHRSQENKDNNRVSYVDGSQSSQK 

SENLTPDQVSQKEGIQAEQIVDCITDQGYVTSHGDHYHYYNGKVPYDALF 

SEELLMKDPNYQLKDADIVNEVKGGYIIKVDGKYYVYLKDAAHADNVRTK 

DEINRQKQEHVKDNEKVNSNVAVAR5QGRYTTNDGYVFNPADIIEDTGNA 

YIVPHGGHYHYIPKSDLSASELAAAKAHLAGKNMQPSQLSYSSTASDNNT 

QSV AKGSTSKPANKSENLQSLLKELYDSPSAQRYSESDGLVFDPAKIISR 

TPNGVAJPHGDHYHFIPYSKLSALEEKIARMVPISGTGSTVSTNAKPNEV 

VSSLGSLSSNPSSLTTSKELSSASDGYIFNPKDIVEETATAYIVRHGDHF 

HYIPKSNQIGQPTLPNNSLATPSPSLPINPGTSHEKHEEDGYGFDANRII 

AEDESGFVMSHGDHNHYFFKKDLTEEQIKVRKNI* 
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TABLE 2 
ID2 840 bp 

ATGGGAATTGCTCTAGAAAATGTGAATTTTACATATCAAGAAGGTACT 
TTTCTTTGACGATTGAAGATGGCTCTTATACAGCTT^ 

ACTCTTAAATGGTTTATTGGTGCCAAGTCAAGGGAGTGTGAGGGTTTITGATACCTTAATCACCT 
AAT^GATATTCGTCA AATTAG AAAACAGGTTGGCTTGGTATTTCAGm 
^GTTTTGAAGGACGTTGCITTTGGACCGCAAAATTTTGGAGTTTCTG 
G ;J^^ GGCTCTGG ™^ 

CGTGTTGCCATTGCAGGCATACTTGCCATGGAGCCAGCTATATTAGTCTTAGATGAGCCAACAGCTGGTCT 

CrCTAGGGAGAAAAGAGTTGATGACCCTGTTCAAAAAACTCCACCAGTCAGGGATGACCATCGTCTTGGTAACGC 

ATTTGATGGATGATGTTGCTGAATATG CGAAT CAAGTCTATGTAATGGAAAAGGGACGTTTAGTAAAGGGGGGCA^ 

AACCAAGTGATGTCTTTCAAGACGTTGTTTTTATGGAAGAAGTTCAGTTGGGAGTACCTAAAATTACGGCCT^ 

LD TAAACGATTGGCTGATAGAGGCGTGTCATTTAAACGATTACCGATTAAGATAGAGGAGTTCAAGGAGTCGCTAAA 
TGGATAG 

MGtAUNVNFTYQEGTPLASAAUDVSLTIEDGSYTALW^ 
. Q"^QVGLVFQFAENQIFEETVUCDVAFGPQNFGVSEEDA^^ 
M£PAlLVUDEPTAGLDPLGRKEl^Tli : KKUiQSGMTIVLVTHLMDDVAEYANQVYVMEKGRL 
FMEEVQLGVPKITAFCKRLADRGVSFKRLPIKIEEFKBSLNG2 

m 3 6360 bn 

25 TACCCGGTAGTCTTAGCAGACACATCTAGCTCTGAAGATGCTTTAAACATCTCTGATAAAGAAAAAGTAGCAGAA 
AATAAAGAGAAACATGAAAATATCCATAGTGCTATGGAAACTTCACAGGATTTTAAAGAGAAGAAAACAGCAGTC 
ArrAAGGAAAAAGAAGTTGTTAGTAAAAATCCTGTGATAGACAATAACACTAGCAATGAAGAAGCAAAAATCAA 
AGAAGAAAATrCCAATAAATCCCAAGGAGAmTACGGACTCATrrGTGAATAAAAACACAGAAAATCCCAAAA^ 
AGAAGATAAAGTTGTCT ATAT TGCTGAATTTAAAGATAAAGAATCTGGAGAAAAAGCAATCAAGGAACTATCCAG 

JU TCTTAAGAATACAAAAGTTTTATATACTTATGATAGAATTTTTAACGGTAGTGCCATAGAAACAACT 

TTGGACAAAATTAAACAAATAGAAGGTATTTCATCGGTTGAAAGGGCACAAAAAGTCCAACCCATGATGAATCA 

GCCAGAAAGGAAATTGGAGTTGAGGAAGCTATTGATTACCTAAACrrCTATCAATGCTCCGTTTGGGAA 

GATGGTAGAGGTATGGTCATTTCAAATATCGATACTGGAACAGATTATAGACATAAGGCTATGAGAATCGATGAT 

GATGCCAAAGCCTCAATGAGATTTAAAAAAGAAGACTTAAAAGGCACTGATAAAAATTATTGGTTGAGTGATAAA 

ATCCCTCATGCGTTCAATTATTATAATGGTGGCAAAATCACTGTAGAAAAATATGATGATGGAAGGGATTATTTTG 

ACCCACATGGGATGCATATTGCAGGG ATTCT TGCTGGAAATGATACTGAACAAGACATCAAAAACTTTAACGGCA 

TAGATGGAATTGCACCTAATGCACAAATTTTCTCTTACAAAATGTATTCTGACGCAGGATCTG 

TGAAACAATGTTTCATGCTATrGAAGATTCTATCAAACACAACGTTGATGTTCrmCGGTATCATCTGG^ 

GGAACAGGTCTTGTAGGTGAGAAATATTGGCAAGCTATTCGGGCATTAAGAAAAGCAGGCAT^ 

W GCTACGGGTAACTATGCGACTTCTGCTTCAVVGTrCnTCATGGGATTTAGTAGCAAATAATCATCTGA^ 

ACACTGGAAATGTAACACGAACTGCAGCACATGAAGATGCGATAGCGGTCGCTTCTGCTAAAAATCAAACAGTTG 
AGTTTGATA^AGTTAACATAGGTGGAGAAAGTTTTAAATACAGAAATATAGGGGCCTTTTTCGATA^ 
TCAG\ACAAj^TGAAGATGGAACAAAAGCTCCTAGTAAATTAAAATTTGTATATATAGGCAw\GGGGCAAGACCA^ 
ATTTGATAGGTTTGGATCTTAGGGGCAAAATTGCAGTAATGGATAGAAT^ 

13 TAAAAAAGCTATGGATAAGGGTGCACGCGCCATTATGGTTGTAAATACTGTAAATTACTACAATAGAGATAATTG 
GACAGAGCTTCCAGCTATGGGATATGAAGCGGATGAAGGTACTAAAAGTCAAGTCTrmCAAm 
TGCTTCn-AAAGCTATGGAACATGATTAATCCTGATAAAAAAACTGAAGTCAAAAGAAATA^ 
AGATAAATT GGAG CAATACTATCCA^TTGATATGGAAAGTmAATTCCAACAAACCGAATGTAGGTGACGAAAA 
AGAGATTGACTTTAAGTTTGCACCTGAC ACAGA CAAAGAACTCTATAAAGAAGATATCATCGTTCCAGCAGCATC 

OU TACATCTTGG GGG CC AAG AAT AG ATTT ACTTTT A AAACCCG ATGTTTCAG CACCTGGT AAA AAT ATT AAATCC ACG 

CTTAA TGTTA TTAATGGCAAATCAACrrATGGCTATATCT^ 
CTACTGTTTTGATTAGACCGAAATTAAAGGAAATGCTTGAAAGACCTGTAT^ 
AAATAGATCTTACAAG TCTTA CAAAAATTGCCCTACAAAATACTCCGCGACCTATGATGGA 
AAGAAAAAAGTCAA TACTT TGCATCACCTAGACAACAGGGAGCAGGCCTAATTAATGTGGCCAATGCTn 

00 ATGAAG7TGTAGCAACTTTCAAAAACACTGATTCTAAAG 

AATAAAAGGTGATAAAAAATACTTTACAATCAAGCITCACAATACATCAAACAGACCTrTGACTT^ 
GCATCAGCGATAACTACAGATTCTCTAACTGACAGATTAAAACTTGATGAAACATATAAAGATGAj^AAAT^ 
GATGGTAAGCAAATTGTrCCAG AAATr CACCCAGAAAAAGTCAAAGGAGCAAATATCACATTrGAGCATGATACT 
TTCACTATAGGCGCAAATTCTAGCTTTGATTTGA^TGCGGTTATAAATGT^ 

OU TTTGTAGAAT CATT TATrCATTrTGAGTCAGTGGAAGCGATGGAAGCrCTAAACTCCAGCGGGAAGAAA 

TTCCAACC 1 1 UTI GTCGATGCCTCTAATGGGATTTGCTGGGAATTGGAACCACGAACCAATCCTTGATAAATGGG 
CTTGGGAAGAAGGGTCAAGATCAAAAACACTGGGAGGTTATGATGATGATGGTAAACCGAAAATTCCAGGAACCT 
TAAATAAGGGAATTGGTGGAGAACATGGTATAGATAAATTrAATCCAGCAGGAGTTATACAAAATAGAAAAGATA 
AAAATACAACATCCCTGGATCAAAATCCAGAATTATTTGCTTTCAATAACGAAGGGATCAACGCTCCATCATCAA 

OJ GTGGTTCTAAGATTGCTAACATTTATCCTTTAGATTCAAATGGAAATCCT 
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AACACCnTCTCCACTTCTATTAAGAAGTGCAGAAGA 

AAATCAAAGAGACTTAAAAGTCATTTCGAGAGAACACTTTATTAGAGGAATTTTAAAT^ 
AAAGGGAATCAAATCATCTAAACTAAAAGTTTGGGGTGACTTGAAG 

TAGAGAAGAAAATGCACG\GAAAGTAAGGATAATCAAGATCCTGCTACTAAGATAAGAGGTCAATTTGAACCGAT 

TGCGGAAGGTCAATATTTCTATAAATTrAAATATAGATTAACTAAAGATTACCCATGGCAGGrrrCCTATATrCCT 

GTAAAAATTGATAACACCGCCCCTAAGATTGTTTCGGTTGATTTTTCAAATCCTGAAAAAATTAAGT^ 

AGGATACTTATCATAAGGTAAAAGATCAGTATAAGAATGAAACGCTATTTGCGAGAGATCAAAAAGAACATCCTG 

AAAAATTTGACGAGATTGCGAACGAAGTTTGGTATGCrGGCGCCGCTCTTGTTAATGAAGATGGAGAGGTTGAAA 

AAAATCTTGAAGTAACTTACGCAGGTGAGGGTG\AGGAAGAAATAGAAAACTTGATAAAGACGGAAATACCAT^ 

atgaaattaaaggtgcgggagatttaaggggaaaaatcattgaagtcattgcattagatggttctagcaatttca 
caaagattcatagaattaaatttgctaatcaggctgatgaaaaggggatgatttcctattatctagtagatcctga 

TCAAGArrCATCTAAATATCAAAAGCTTGGCGAGATTGCAGAATCTAAATTTAAAAATTTAGGAAA 

^ ^^L^SZ!?*^^ '^^^^^"^ ATACAACTGGGGTAG AmACATC ATCATC AAG AAA ATG AAG AGTCT ATTAAAG AAAAAT 

CTAGTTTTACTATTGATAGAAATATTTCAACAATTACAGACTTTC 

GAAATTTAGAGAAGTTGATGATTTTACAAGTGAAACTGGTAAGAGAATGGAGGAATACGATTATAAATACGATGA 

TAAAGGAAATATAATAGCCTACGATGATGGGACTGATCTAGAATATGAAACTGAGAAACTTGACGAAATCAAATC 

AAAAATTTATGGTGTTCTAAGTCCGTCTAAAGATGGACACTTTGAAATTCTTGGAAAGATAAGT^ 

AATGCCAAGGTATATTATGGGAATAACTATAAATCTATAGAAATCAAAGCGACCAAGTATGATTTCOVCTCAAAA 

ACGATGACATTTGATCTATACGCTAATATTAATGATATTGTGGATGGATTAGCTTTTGCAGGAGATATGAGATTAT 

TTGTTAAAGATAATGATCAGAAAAAAGCTGAAATTAAAATTAGAATGCCTGAAAAAATTAAGGAAA 

AATATCCCTATGTATCAAGTTATGGGAATGTCATAGAATTAGGGGAAGGAGATCrrTCAAAAAACAAACCAGACA 

ATTTAACTAAAATGGAATCTGGTAAAATCTATTCTGATTCAGAAAAACA 

TCTAAGAAAAGGCTATGCACTAAAAGTGACTACCTATAATCCTGGAAAAACGGATATGTTAGAAGGAAATGGAGT 

CTATAGCAAGGAAGATATAGCAAAAATACAAAAGGCCAATCCTAATCTAAGAGCCCTTTCAGAAACAACAATTTA 

TGCTGATAGTAGAAATGTTGAAGATGGAAGAAGTACCCAATCTGTATTAATGTCGGCTTTGGACGGCTTTAACATT 

ATAAGGTATCAAGTGTTTACATTTAAAATGAACGATAAAGGGGAAGCTATCGATAAAGACGGAAATCTTGTGACA 

G ATTCTTCT AAA CTTGT ATT ATTTGGT AAG G ATG AT AAAG A AT AC ACTGG AG AG G ATAAGTTC AATGT AG AAGCTA 

TAAAAGAAGATGGCTCCATGTTATTTATTGATACCAAACCAGTAAACCTTTCAATGGATAAGAACrACm 

ATCTAAATCTAATAAAATTTATGTACGAAATCCAGAATTTTATTTAAGAGGTAAGATTTCTG 

AACTGGGAATTGAGAGTTAATGAATCGGTTGTAGATAATTATTTAATCTACGGAGATTrACACXTTGATAACACT 

GAGATTTTAATATTAAGCTGAATGTTAAAGACGGTGACATCATGGACTGGGGAATGAAAGACTATAAAGCAAACG 

GATTTCCAGATAAGGTAA CAGATA TGGATGGAAATGTTTATCTTCAAACTGGCTATAGCGATTTGAATGCTAAAGC 

AGTrGGAGTCCACTATCAGTrrTTATATGATAATGTTAAACCCGAAGTAAACATTGATCCTAAGGGAAATACTAGT 

ATCGAATATGCTGATGGAAAATCTGTAGTCTITAACATCAATGATAAAAGAAATAATGGATTCGATGGTGAGATT 

CAAGAACAACATATTTATATAAATGGAAAAGAATATACATCATTTAATGATATTAAACAAATAATAGACAAGACA 

CTAAACATTAAGATTGTTGTAAAAGATTITGCAAGAAATACAACCGTAAAAGAATTCATTTTAAATAAAG 

GGAGAGGTAAGTGAATTAAAACCTCATAGGGTAACTGTGACCATTCAAAATGGAAAAGAAATGAGTTCAACGATA 

GTGTCGGAAGAAGATTTTATTTTACCTGTTTATAAGGGTGAATTAGAAAAAGGATACCAATTTGAT^ 

TTTCTGGTrrCGAAGGTAAAAAAGACGCTGGCTATGTTArrAATCTATCAAAAGATACCTrrATAAAAC 

CAAG A AAAT AG AG G AG AAAAAGG AGG AAG AAAAT AAACCT ACTTTTG ATGT ATCG AAAAAG AAAGAT AACCCAC 

AAGTAAACCATAGTCAATTAAATGAAAGTCACAGAAAAGAGGATTTACAAAGAGAAGAGCATTCACAAAAATCT 

GATTCAACTAAGGATGTTACAGCTACAGTTCTTGATAAAAACAATATCAGTAGTAAATCAACTACTAACAATCCT 

A ATAAG TTGCCAAAAACTGGAACAGCAAGCGGAGCCCAGACACTATTAGCTGCCGGAATAATGTTTATAGTAGGA 

ATTTTTCTTGGATTGAAGAAAAAAAATCAAGATTAA 

YPWLADTSSSEDALNISDKEKVAENKEKHENIHSAMETSQ 

^QGDYTDSFVNKOTENPKKEDICNfVYIAEFKDKESGEKAiKELSSLK^^ 

SVERAQKVQPMMWMRKHGVEEAIDYLKSINAPFGKNFIXjRGMVISNIDTGTDYR^ 

KGTDKNYWL5DKIPHAFhTVYNGGKn^EKYDDGRDYFDPHGMHIAGIl^GNDTTQDIKNFNGIDGIA 

SDAGSGFAGDETMFHA1EDSIKHNVDVVSVSSGFTGTGLVGEKYWQAIRALRKAGIPMVVATGNYATSASSSSWDLVA 

NNHUCMTDTGNVTRTAAHEDAIAVASAKNQTVEFDKVNIG^ 

GQDQDLIGLDLRGKMVMDRIYTKDLKNAFKKAM^^ 

SGDDGVKLWNMINPDKKTEVKRNNKEDFKDKLEQYYPIDMESFNSNKPNVGDEKEIDFKFAPDTDKELYKEDIIVPAG 

STSWGPRJDLLUCPDVSAPGKNIKSTLNVINGKSTYGYMSGI^MATPrVAASTVLIRPICLXEM 

TSLTKMLQNTARPMMDATSWKEKSQYFASPRQQGAGLINVANAUW^^ 

YfTTKLHhn-SNRPLTFKVSASAriTDSLTDRlJCU)ETYKDEKSPIX3KQIVPE 

AVINVGEAKNKNKFVESFIHFESVEAMEALNSSGmNFQPSLS^^ 

DDGKPKIPGTLNKGlGGEHGIDKFNPAGVIQNRKDKhrrTSLDQNPELFAFNNEGlNAPSSSGSKIANIYPLDSNGNPQDA 
QLERGLTPSPLVLRSAEEGLISIVrfimEGENQRDLKVISREHFlRGI^^ 

REENAPESKDNQDPATmGQFEPIAEGQYFnCFKYRLTKDYPWQVSYIPVKIDI^APKIVSVDFSNPEKK^ 
VKDQYKNETL^ARIXJKEHPEKFDEIANEVWYAGAALVNETCEVEK^ 

DLRGKIIEVIALTCSSNFTXIHRlKFANQADEKGMISYYLVDPDQDSSKYQKLGEIAESKFKNLGNGKEGSUa^ 
HHHQENEESIKEKSSFnDRNISTlRDFE>«DLKKLIKKKFREVDDFTSETGKRMEEYDYKYDDKGNIUYD 
TEKLDEIKSKIYGVl^PSKDGHFEILGKISNVSKNAiCVYYGNNYKSIEI^ 
DMRLFVKDNDQKKAEIKIRMPEKIKETK5EYPYVSSYGNVIELGEGDI^K^^ 
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IRKGYAUCVTITNPGCTDMl^GNGVYSKE^ 
VFTTTCMNDKGEAIDKIXiNLVTDSSKLVLFGKDDKEYTGEDKFN^ 
YVRNPEFYUlGKISDKGGF^WE^WNESVVD^fYL^YGDLHID^^^^ 
MDGWYL<^GYSDLNAKAVGVHYQFLYDNVKPEVNIDPKG^^ 

E YTSFNDIKQnDKTLNIKIVVKDFARNTm pv 

GYQFTCWEISGFEGKKDAGYVINLSKDTFIKPVFKKIEEKKEEENKPTro^ 

SQKSDSTKDVTATVUDKNNISSKSTTNNPNKlJICTGTASGAQTtLAA 

IP6 597 bo 



CTTGAATTAAATAAAAAACGTO\TGCCACT^ 

CTATCGAAATTGCAACCTTAGCGCCAAGCGCCCACAACAGCCAGCCTTGGAAATTTGTOT 
ATGCTGAACTGGCAAAGTTAGCITATGGTTCCAATTTTGAA^ 

TACAGAT ACGGA CITAGCCAAACGTGCTCGTAAGATTGCCCGTGTTGGTGGTGCTAATAACTTTTCT 
I J CTTCAATATTTTATGAAAAATCTGCCAGCTGAGTTTGCCCGTTACAGTGAGCAACAAGTCAG 
TCAATGCAGGTTTGGTTGCCATGAACTTCKjTTCTTGCATTGACAGACCAAGGAATTGGTTCT 

ttttgacaaatcaaaagttaatgaagttttggaaatcgaagaccgtttccgcccagaactct^ 
tatacagacgaaaaattggaaccaagctaccgcrtgccagtagatgaaat^tcgagaaaagatag 

20 LELhnCKRHATKHFTDKLVDPKDVRTAEUTlJ^ 

DLAKIUJ^IARVGGANNFSEEQLQYFMKNLPAEFARYSEQQVSDYI^NAGLVAMNLVLALTDQGIGSNIILGFDKSK 
VNEVLEIEDRFRPELLTTVGYTDEKLEPSYRLPVDEIIEKRZ 



ID7 1401 bo 



ATG AC AG CAATTG ATTTT ACAGCAG AAGT AG AAA AA CG C AAAG AAG ACCTCTTG GCTC 

GAAATCAATTCAGAACGTGATGACAGCAAGGCTGATGCCCAGCATCCATTTGGGCCTGGTCCAGTAAAAGCCTTG 
GAGAAATTCCTTGAAATCGCAGACCGCGATGGCrACCCAACTAAGAATGTTGATAACTATGCAGGACATTTTGAG 
TTTGGTGATGGAGAAGAAGTTCTCGGAATCmGCCCATATGGATCrrGGTGCCTGCTG 
iU ACCCTTACACA CCAAC TATCAAAGATGGTCGCCTTTATGCGCGCGGGGCTTCGGACGATAAGGGTCCTACAA(^G 
Ci 1UA rACrATGGTTTGAAAATCATCAAAGAATTGGGTCrTCCAACTTCTAAGAAAGTTCGCTTCATCGTTGGAAC 
AGACGAAGAATCAGGCTGGGCAGAGVTGGACTACTACTTTGAGCACGTAGGAC^ 

CTCACCAGATGCTGAATTTCCAATCATCAATGGTGAAAAAGGAAATATCACGGAATACCTCCACTTTGCAGGAGA 
AAATACAGGTGTTGCCCGTCTTCACAGCTrTACAGGTGGTTTACGTC 
JZ) AGTCGTTTCAGGTGACTTGGCTGACTTGCAAGCTAAACTAGATGCCTTTGTTGCAGAACACA 

ACTCCAAGAAGAAGCTGGCAAATACAAGGTGACGATCATTGGTAAATCAGCCCACGGTGCTATGCCTGCTTCA 

TGTCAATGGCGCAACTTACCTTGCCCTCTTCCTCAGCCAGTTTGGCTTTGCrGGTCCAGCCAA^ 

ATCGCAGGTAAAATTCTCn'GAACGATCATGAGGGTGAAAATCT^ 

GCTCTTTCTATGAATGCCGGCGTCTTCCACTTCGATGAAACAAGTGCTGATAATACCATTGCCCTCAACATCCGCT 
4U ATCCAAAAGGAACAACrrCCAGAACAAATCAACTrCAATCCTTGAAAACTTGCC^ 
ACACGGTCACACGCCTCACTATGTGCCAATGGAAGATCCACTTCT 
AACTGGCTTTAAAGGTCATGAACAAGTCATCCGTGGTG^ 
CGGTGCTATGTTCCCAGACTCGATTGATACCATGCACCAAGCCAATGAAT^ 
GCAGCAGCAATTTATGCCGAAGCTATTTACGAATTGATCAAATAA 

45 

MTAIDFTA£VEKRKEDU*ADLPSLI^NSERDDSKADAQHPFGPGPVKA1^KFLE1ADRDGYPTKNVDNYA 
GEEVLGlFAilMDWPAGSGWDTDPYTPTIKMRLYARGASDDKGm 
AX>MDYYFEHVGLAJCPDFGrePDA£FPHNGEKGNITEYLHFAGENTC^ 
QAKLDAFVMEHKUIGELQEEAGKYKVTIIGKSAHGAMPASGVNGAT^ 
OU ENLKUHVDEKMGAL5MNAGVITODETSADhrnALNmYPKGTSPEQ(KSlLENLPVVSVSLSEHGHTPHYVPMEDPLVQ 
TLi^YEKQTGFKGHEQVIGGCnTGRIXERGVAYGAMFPDSIDTMH^^ 

tpg mi *P 

55 GTGTATACTATTATAAAATCAAATATAAAAAAATTTAGTTTATTAACGATATTTATTGTTGCT 
AATTTATGCAGCAACTATTAATGCTCTGGTCTrTGAATGAATTAATTGCGATGAATTTAGAGC 
TCAATCT ACCA AATGATTGTCTGGTGTGGGATAATATTCCTTGACTGGGTAGTGAAAAATTATCAGGTTGAAGTGA 
TCCAAGAGTTTAATCTAGAGATTCGAAATAGAGTTGCCACAGACATCTCTAACTCTACCT^ 

£A TAAAT CATCAGGAACATATCTTTCGTGGCTAAATAATGATGTTCAGACTTT 

OU TTTTTAGTAATAAAAGGAATTTCTGGTACTATATTTGCAGTTGTG 
AGCCACCTTGTTTTCATTAATGATTATGCTACTTGTACCAAA 

AATTT AACTA ACCAAAATGAAGCTTTTTTAAAATCTAGTGAGACTATATTGAATGGATTTG 
TGAATC i 1 1 1 ATATGTATTGCCTAAGAAAATTAAAGAAGCAGGAATTTTATTAAAGATGGTTATACAAAGAAAGA 
CAACTGTACAAACGTTAGCAGGCCCTATTACCTTCTTC^ 
00 GGCTATCTTGCAATAAAAGGAATAGTGAAAATTGGTACTATTC 
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ACAGCGCTAGGTGAATTAGGAGGTCAATTATCCTCTATTATTCCTACG 

ttaatccaattgagtcaaataaaatgaatgatatcgaaccaaatgaggtgaatagagattttccgttatatgaag 
caaaaaat atttgct at aagt atgg ag at aaag aa atatt aaaaaactt aa a 1 1 1 1 igiiti caacgtaatgaaaa 
gtatttaattttaggtgaaagtggaagcgggaaatctac^ttatraaaattattg 

j agtggagaattgcgattctgcggggatgatataaaaaaaa(xtcctatitaaatatggtitcgaatgttctatatc 
tagatcaaaaa gctt atttgtttciaack}taccattagag 

aatactacagtctttagagcaagttggtttgagtgtaaaagatttrcctaataaca 

gatgatgggagattactgtcaggagggcagaaacaaaaaattactttagctagagggctaattagaaataagaa 

AATAGTATTAATTGACGAGGGAACTTCTGCTATCGATAGGAGAACTTCGTTAGCGATTGAACGTAAGATATTAGA 
1 U TAGAGAGGATTTGACTGTCATTATTGTTACCCATGCTCCGCATCCGGAACTTAAAC 
CAATTTCCAAAGGATTTTATTTAA 

MYTIIKShTOCKFSLLTlRVAGQLlXIYAATl^ 

EIRNRVATDISNSTYQEFHSKSSGTYL5WLNNDVQTTNDQAFKQLFLVIKGISGTIFAVVTLN 
15 LXVPKIFASKMREVSLNLTNQKEAFUCSSETIU^GFDVLASLKU^ 

FRJISLVFLTGYlJUKGIVKlGTIEAIGALTGVIFrALGELGGQLSSncnXPIFU^ 
AKOTCYKYGDKEILKNU4FCFQRNEKYULGESGSGKST^ 
KAYLFEGTIRDNILLEENYTDEEILQSl£QVGl^VKDFPNNlU5YYVGDDGRl^ 
S AIDRRTS LAIERKILD REDLTVIIVTHAPHPELKQ YFTKIYQFPKDFIZ 



20 



[P?70S bp 



ATAACAGTTAAACAGATTATGGACGAAATAGCCGTTTCAGATATGACnXjCAAGGCGCTATTTACAGGAATTAGCT 
GATAAAGATTTGCTGATTCGTGTGCATGGTGGAGCTGAAAAACTTCGAACCAACTCCCTrTTGACTAATGAGCGAT 
25 CAAATATTGAAAA ACAA GCCCTCCAAACGGCAGAAAAACAAGAAATAGCCCATTTTGCAGGCAGTCTAGTAG^ 
GAAAGAGAAACTATTTTCATTGG ACCAGG AACAACATTAGACrm^ 
GCGTCGTAACCAACAGTCTACCTGTTTTTCTGATT^ 

AAATTATCGCGATATTACAGGTGCTTTTGTTGGTACATTGACCCTACAAAATCTCTCTAA 
GCTT TCGTT AGCTGTAATGGTATTCAAAACGGAGCTCTAGCTACTTTTAGCGAGGAAGAGGGAGAGGCT 
30 ATCGCTTTAAATAATTCTAATAAAAAATATTTACTCGCAGATCATAGCAAGTTCAATAAGTTTGATT^ 
TTATAATGTATCAAATCTTGATACTATTGTTTG\GArrCT 
ACATT AAAGTCATCAAGC CTTAA 

rTVKQIMDElAVSDMTARRYLQELADKDLURVHGGAEKLRTNSLLT^ 
35 HGPGTTLEFFARELPIDNIRVVTNSU'VFLIL^ERKLTDUUGGNYRDrTG 
U\TFSEEEGEAQRIALNNSNKKYLIJU>H$KFNKFDF^ 

TPliO^ *?P 

40 ATGACTGAGTTTTCGTTAGATCTTCTTCTAGAAGCCATTAAACT 

AGCTAGACAAAACAGATAAAGACCAAGAGCTTAAAACTGAAATTCAATCCATCTn'ATC 
ATGCTTATCGCCGGGTTCATTTAGAACTAAGAAATCGTGGTTATCTGGTAAATCATAAAAGAGTTCAAGGCTTGa 
GAAAGTACTCAAmACAAGCTAAAATGCGAAAGAAACGAAAATATTCTTCTCATAAAGGAGACGTTGGTAAGAA 
GGCAGAGAATCTCATTCAAGCCCAATTrGAAGGCTCTAAAACAATGGAAAAGTGCTACACAGATGTGACTGAATT 

45 TGC CATT CCAGCAAGTACTCAAAAGCTTTACTTATCACCAGTTTTAGATGGCTTTAACAGC^ 
AATCTTTCTTGTTCGCCTAATTTAGAATAA 

MTEFSLDLLLEAIKLARWTYYYHUCQU^KTDKI^ 

VLJ^QAKMRKKRKYS5HKGDVGKKAENLIQAQFEGSKTMEKCYTDVTEFA1PASTQKLYL^ 
, 50 PNLEZ 

rPU 1266 bo 

CCAGGATTTGGTACCGTTGCAAGTGGTGTGCCTTTCCTCCT 

55 CATTCAGATATCAAAGTTGCTAAGGTATTGGTCAAGGATGAAGATGAAAAAAATCGCTTGCTTGCAGCAGGGAAT 
GACTTTAACTTTGTAACCAATGTGGATGATATTTTATCAGACCAGGATATTACTATCGTAGTGGAATTGAT 
GTATTGAGCCTGCTAAAACCTTTATCACTCGTGCCTTGGAAGCTC^AAAACACGTT 
TTTAGCTGTCCATGGCGCAGAATITiCTAGAAATCGCTCAAGCTAACAAGGTAG 
TGCTGGTGGGATTCCAATTCnTCGTACTTTAGCAAATTCCTTGGCTTCTGATAAAAT^ 

60 GTCAACGGAACTTCCAACTTCATGGTGACGAAGATGGTGGAAGAAGGCTGGTCTTACGATGATGCTCTTGCGGAA 
G CACA ACGTCTAGGATTrGCAGAAAGCGATCCGACGAATGACGTAGATGGGATTGATGCAGCCTACAAGATGGTT 
ATTTTGAGCCAATITGCCTTTGGCATGAAGATTGCCTTTGATGATGTAGCCCAC^ 

CAGAAG ACGTAGCTGTAGCTCAAG AGCTrGGTTACGTAGTGAAATTGGTTGGTTCTATTGAGGAAACri CI 1 CAGG 
TATTGCTGCAGAAGTGACTCCAACCTTCCTACCTAAAGCGCACCCACTTGCTAGTGTGAATGGCGTAATGAACGCT 
65 GTCTTTGTAGAATCTATCGGTATTGGTGAGTCTATGTACTACGGACCAGGTGCGGGTCAAAAACCAACTGCAACA 
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AGTGTTCTAGCTGATATTCTCOTATC 

GCCGTGACTTGGTCnTGGCAAATCCTGAAGATGTCAAAGCAAACTACT 
AGGTCAGGTCTTGAAGTTGGCTGAAATCTTCAATGCTCAAGATATTTCCTTTAAGCAAAT 

GAGGGTGAC AAGG CGCGTGTCGTTATG^TCACAG\CAAGATTAATAAAGCCCAGCTTGAAAATGTCTG\GCTGAA 
TTGAAGAAGGTTTCAGAATTCGACCTCTTGAATACCTTCAAGGTGCTAGGAGAATAA 

PGFGTVASGVPFUJCENGGKINQSAHSDIKVA^ 

AKTFTTRAl£AGKHVyTANKDLi^VHGA£UXlAQANKVALYYEAAVAGGIP 
MVTKMVEEGWSYDDALAEAQRLGFAESDin^ 

UjYVVKLVGSrEETSSGlAAEVTPTnJKAHPLASVNGVMNAVFVESIGIGESM 
NDGTIGKDFNEYSRDLVLAi^PEDVKANYYFSIlj^LOSKGQVUCLAEIFNAQDIS 
QLENVSAELKKVSEFDLLNTFKVLGE2 

AT GAAACA CCTATTATCTTACTTCAAACCCTACATCAAGGAATCAATTTTAGCCCCCTTGT^ 
CTCTTTTTTGAGCTCTTGGTTCCCATt3GTGATTCCTGGGATTG 

TCTCTGGATGCAGATTGGCCTGCTCCTTATCTTTGCAGTAATTGGCGTTTTAGTGGCCTTGATAGCT 
CAGCAAAGGCAGCAGTAGGTTCrGCTAAGGAATTGACAAACGATCTTTATCGTCATATTCTTTCCT^ 
CAGCAGAGACCGTCTG ACAACT TCTAGTTTGGTCACTCGCTTGACTTCGGATACCTACCAGATTCAGACTGGTATC 
AATCAATTCCTGCGTCTCTTTTTACGAGCGCCCATTATCGTTTTTGGTGCCATTTTTATG^ 

TGAGTTGACTTTCTGG TTCTI AGTCTTGCnTGCCATTTTGACCATTGTCATTGTAGGGTTATCTCGATTGGTCAATC 
CTTTCTACA GTAGT CTCAGAAAGAAAACGGACCAACTGGTTCAGGAAACGCGCCAGCAATTGCAAGGGATGCGGG 
TTATTCGTGCTTTTGGTCAAGAAAAACGAGAGTTACAGATTTTT 

AGAAAAGACAGGmCTGGTCTAGTITATTAACACCTCTGACCTATCTGATTGTCAATGGAACTOT 

ATCTGG CAAG GCTATATTTCAATTCAAGGAGGAGTGCTCAGTCAAGGTGCTCTCATTGCTCTTATCAATTACCTCT 

TACAGATTTTGG TGGA ATTGGTCAAGCTAGCCATGTTGATCAATTCCCn'CAACCAGTCCTATATCTCAGTCAAGCG 

A ATCGA GGAAGTCTTTGTTGAGGCrCCAGAGGATATCCATTCAGAGTTAGAACAAAAGCAAGCTACCAGAGATAA 

GGTTTTACAAGTCCAAGAATTGACCTTTACCTATCCTGATGCGGCCC^ 

^'"'"^^^r^"^^^^* ^ C AAATTXCTT A GOT AXC-A TCG G GOO AA. CTG OTTCXG GX A AA X C AAG CTTXG GXG C AjVOTCTTT AOTTG 
GACTTTATCCAGTAGACAAGGGGAACATTGACCTITATCAAAATGGACGTACT^ 
GGTCTTGGATTGCCTATGTACCTCAAAAGGTCGAACTCTTTAAAGGAACC 
CAATCAAGAAGTATCTGACCAGGAACTCTGGCAGGCCTTGGAGATTGCGCAAGCT^ 

GGAAGGACTCTTGGATGCTCTAGTTGAGGCAGGGGGGCGAAATTTCTCAGGTGGACAAAAACAAAGATTGTCTAT 

CGCCCGAGCAGTCTTGCGCCAGGCTCCGTTTCTCATCCTAGATGATGCAACCTCGGCACTGGATACCATTACAGAG 

TCCAAGCTCTTGAAAGCTATTAGAGAAAATmCCAAACACGAGCTTAATTTTGATCTCTC^ 

TACAGATGGCGGACCAGATTCTCCTCTTGGAAAAAGGTGAGTTGCTAGCTGTTGGCAAGCACGATGACTTGATGA 

AATCCAGCCAAGTCTATTGTGAAATCAATGCATCCCAACATGGAAAGGAGGACTAG 

MKHLLSYFKPYIKESILAPLFKLLEAVFELLVPMV1AGIV 
AVGSAKELTNDLYRHU^LPKDSRDRLTTSSLVTRLTSDTYQI^^ 
VAILTIVIVGLSRLVNPFYSSLRKKTIXJLVQETRQQL^ 
TYLIVNCrrLLVIIWQGYISIQGGVI^QGAUALINYL^^ 

ATM)KVLQVQELTmPDAAQPSUlYISFDMTQGQILGnGGTGSGKSSLVQLLLGLYPVDKGNIDLYQNGRSPLNLEQ 

WRSWUYVPQKVEU^GTIRSNLTLGFNQEVSDQELWQALEIAQAKDFVSEKEGLLDALVEAGGRNFSGGQKQRLSIA 

RAVLRQAPFULDDATSAUDTTTESKLUCAIRENFP^ 

EINASQHGKEDZ 

ATGAAACGTTCTCTCGACTCAAGAGTCGATTACAGTTTGCTCTTGCCAGTATTTTTTCTACT 
GGCTATCTATATAGCCGTT AGTCA TGATTATCCCAATAATATTCTGCCCATTTTAGGGCAGCAGGTCGCCTGGATT 
G CCTTG GGGCTTGTGATTGGTTTTGTGGTCATGCrCTTTAATACAGAA ll'l C l' l 'I GG A AGGTGACCCCCTTTCT AT A 
TATTTTAGGCTTGGGACTTATCATCTTGCCGATTGTATTTTATAAT 

AACTGGGTATCAATAAATGGAATTACCCTATTCCAACCGTCAGAATTTATGAAGATATCCTATATCCTCATGTTGG 
CrCGTGTCATTG TCCA AmACAAAGAAACATAAGGAATGGAGACGCACGGTTCCGCTGGA CriMU -l UlU AAT^ 
CTGGATGATTCTCmACCArrCCAGTCCTAGTrCTTTTAGCACTTCAA^ 
TAGCCATTTTCTCAGGAATCGTITTATTATCAGGGGTTTCTTGGAAAATTATT^ 

ACAGGAGTTGCTGG 1 1 1 C irAG CTATCTTTATTAGCAAGGACGGACGAGCri'riCIU CACCAGATTGGAATGCCGA 
CCTACCAAATTAATCGGATTTTGGCTTGGCTCAATCCCTTTG 

AGGGCAGATTGCC ATTGG GAGTGGTGGCTTATTTGGTCAGGGATTTAATGCITCGAATCTGCrTATC 
GAGTCAGATATGATTTTTACGGTTATTGCAGAAGATTTTGGCTTTATTGGCTCTGTC 
CATGTTGATTTACCGTATGTTGAAGATTACTCrrAAATCAAATAACCAGrrCTACACTTATAm 
TTATGATGTTGCTCTTCCACATCTTTGAGAATATCGCrrGCT 

CCTTTCATTTCGCAAGGGGGATCAGCTATTATCAGTAATCTGATTGGTGTTGGTTTGCTTTTATCGATGA 
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GACTAATCTAGCTGAAGAAAAGAGCGGAAAAGTCCCATTCAAACGGAAAAAGGTTGTATTAAAACAA 
A 

MKRSLDSRVDYSlXLPVFFLLVIGVVArYIAVSHDYPNM^ 
D GLMILPrVFYNPSLVASTGAKNWVSlNGrTLFQPSEFMKISYILMLARVIVQF^ 
VLl^LQSDUSTALVFVAIFSGNLLSGVSWKIIttVFVTAVTC^ 
AQTTTYQQAQCQlAJGSGGLFCX3GFNASNLLIPVRESDMIFrVU 
. ISTGLJMMLLFHIFENIGANn*GIiPLTGlPLPFISQGGSAJlSNLIGVGlXl^MSYQTNLAEEKS 

10 1D22 987 bo 

ATGGTGGCTAAGAAAAAAATCTTATTrmATGTG 
CCATTGTTTCAAATCTGGATCCAGAAAAGTATGATATTGATATTCT^ 
ATCTGTTCCAAj^GCATGTACGCATTTTAAAATCCCTTCAAGATT^ 
13 TGGAGAATGAGAATTTATTTTCCAAGACTGACTCGTCGTTTGCTTGTAAAAGATGATTATGATGTT^ 

TTACCATTATGAATCCACCACTGTTGTTCTCTAAAAGAAGAGAAGTCAAGAAGATATCTTGGATTCATGGAAGTAT 
TGAAGAAL i 1 Ui A^GGATAGCTCTAAAAGAGAATCACATAGAAGCCAGTTGGATGCTGCGAATACAATTGTAGG 
GATTTCA^\AAAAGACCAGCAj\TTCTATCAAGGAAGTTTATC 

GGATATGATTTTCAGACTATTCTAGAAAAATCTCAAGAGA^GATCGATATCGAGATTGCT^ 
IV CTATCCKjACGGATTG AGGAA AATA^GGCTTCTGACCGTCrrAGTGGAAGTGATACGATTATTAC^ 
AAAACTATCATCTCTATTTTATCGGGGCT^ 

ttgaggactatgtaca tttcct tggttatcaaaaaaatccttatcagtatctatctcagacgaaa g t r e I i ll GTCT 
atgtctaaacaagaaggttttcctggagtgtatgtggaggccttgagtct 

ttggaggggctgaggaattatcccaagaaggacgatttggacaaatcattgagagcaatcaagaggcagctcag 

gcgattactaattacatgacttctgcctcaaactttgatgtcg 

ttacaaaacaaatcgaacaagtagaaaaactattagaggagtag 

MVAKKKIIJTMWSFSLGGGA£KII^TTVShfLDPEKYDIDILEMEHFDKGYESVPKHVW 
RIYFPRLTRW-LVKDDYDVEVSFTIMNPPLI^SKRREVKKISWIHGSIEELUCDSSKRESHRSQL 
JU EVYPDYTSKLQTIYNGYDFQTILEKSQEKIDI^ 

KKRVKEYGIEDYVHFLGYQKNPYQYLSQTKVLLSMSKQEGFPGVYVEAXSLGLPFlSTDVGGAEELSQEGRFGQnESNO 
EAAQAITNYMTSASNFDVDEASQFIQQFTITXQIEQVEKLLEEZ 



35 



1D23 1434 bo 



ATGGAAACTGCATTAATTAGTGTGATTGTGCCAGTCTATAAT^ 
TTCAGAAGCAGACCTATCAAAATCTGGAAATTArTOT^ 

TGATTCA^TCGCTGAACAJ\GATGACAGGGTGTCAGTGCrTCATAAAj\AGAACGAAGGATTGTCGCAA^ 
TGATGCGATGAAGCAGGCTCACGGGGATTATCTGATTTTTATTGACTCAGATGATTATATCCATCCAGAAATGATT 

4U CAGAGCTTATATGAGCAATTAGTTCAAGAAGATGCGGATGTTTCGAGCTGTGGTGTCATGAATGTCTATGCTAATG 
ATGAAAGCCCACAGTCAGCCAATCAGGATGACTATTTTGTCTGTGATTCTCAAACATTTCrA 
AGGTGAAAAAATACCTGGGACGATTTGCA ATAAG CTAATCAAGAGACAGATTGCAACTGCCCTATCCTTTCCTAA 
GGGGTTGATTTACGAAGATGCCTATTACCATTTTGATTTAATCAAGTTGGCCAAGAAGTATGTGGTTA^ 
CCCTATTATTACTATTrCCATAGAGGGGATAGTATTACGACCAAACCCTATGCAGAGAAGGATTTAGCCTATATTG 

43 ATATCTACCAAAAGTTTTATAATC 

CTATG CCCACi iui i ATTCTGG ATAA GATGTrGCTAGATGATCAGTATAAACAGTTTGAAGCCTATTCTCAGATT 
CATCG i 1 1 1 I 1 AAAAGGCCATGCCTTTGCTATTTCTAGGAATCCAj\TTTTCCGTAAGGGGAGAAGAj\TTAGTGC^ 
TGGCCCTATTCATAAATATTTCCTTATATCGATTCi™ 

50 ° 

METALISVIVPVYNVAQYLEKSIASIQKQTYQNLEIILVDDGATDESGR^^ 

KQAHGDYUFIDSDDYIHPEMIQSLYEQLVQEDADVSSCGVMWYANDESPQSANQDDYFVCDSC3TFLKEYLIGEK1PG 
TICNKLKRQIATALSITCGLIYEDAYYHFDUKIAK^ 
KNYPDLKEVAFFRLAYAHFFILDKMUDDQYKQFEAYSQIHRFLKGHAFAB^ 
55 NIEKSKKLHZ 

60 ATGAGAATG\AAGAGA AAACC AATAATATTAATGGAGGAATAAAAAATGTAAGTAAGCATTATGGTCATTCAATC 
ATTCTCAAAGATATAAATTrTGCACTTAACAAGGGTGAAATTGTTGGTCrAGCAGGGAGAAATC 
AGTACGTTGATGAAAATTCTTGT TCAGA ATAATCAACCGACTTCAGGTAATATTATAAGCAGTGATA7VTGTTGGOT 
ATTTAATCGAAGAACCAAAATTATTTTTATCTAAAACA 

TGTTGACTACAATCAAGAAAGATTTAGATGTTTGATCCAAGAGTTAGATTTGACTCAGTCT 
OD AAGACCTATTCTTTGGGTACAAAACAAAAATTAGCTTTGCTTCTAACTCTCGTT 
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TAGATGAACCGACTAATGGTTTAGATATTGAATCATCACAAATAGTTTTAGCGG 

TGAAAATGT GGGAA TmAATATCGAGTCATAAATTAGAAGACArrGAAGAAATTrGTGACAGAGTTCTm 
GAGAACGGGCTTTTG ACATTTCAAAAAGTAGGAAAAGATAGTCATAA 1 1 1U l U n' l 'GAGATAGCTTTTTCATCAG 
CTACAGATAGAGACATTTTCATTACCAAACAAGAATTTTGGGATATTGTTTAG 

MRKEKTNNINGGIKNVSKHYGHSUIJCDINFALNKGEIV^ 
KUT^KTGUNLKYL^NLYGVDYNQERFRCLIQEUJL^ 
SSQrVl^VUCKDVLHENVGILISSHKLEDlEEK^ 

10 IMS 1704bp 

ATGACTGAATTAGATAAACGTCACCGCAGTAGCATTTATGACAGCATGGTTAAATCACCTAACCGTGCTATGCT 
GTGCGACTGGTATGACAGATAAGGACTTTGAAACATCGATTGTGGGAGTGATTTCGACTTGGGCGGAAAATACAC 
CATGTAACATTCACTTGCATGATTTCGGGAAACTGGCTAAAGA 
ID AGTTTGGAACCATTACCGTAGCGGACGGGATCGCTATGGGAACGCCTGGTATtjCGTTTCTCTCTAACAT 

CATCATCGCGGACTCCATCGAGGCGGCTATGAGTGGTCACAACGTGGATGCCTTCGTCGCTATCGGTGGCTGTGA 

CAAGAACATGCCTGGATCTATGATTGCTATTGCTAATATGGATATCCCAGCTATTTTCGCCTATGGTGGAACT 

GCACCGGGAAATCTTGATGGTAAAGATATCGACTTGGTTrCT 

GACATGACAGCTGAGGACGTGAAACGTCTTGAATGTAATGCCTGCCCTGGCCCTGGTGGTTGTGGTGGTATGTAT 

2U ACTGCTAATACCATGGCAACTGCTATCGAAGTTCTAGGGATGAGTTTGCCAGGGTCATCCTCTGACCCAGCTGAAT 
CAGCTGATAAGAAAGAAGATATCGAAGCAGCAGGACGTGCTGTTGTTAAGATGTTGGAACTTGGTCTCAAACG\T 
CAGATATCTTGACTCGTGAAGCCTTTGAAGATGCTATCACTGTAA 
TCTTCACTTGCTCGCCATTCCCCATGCCGCAAATGTTGACTTGTCACX^ 
GTGCCTCACTTGGCCGACTTGAAACCATCTGGTCAGTATGTCT 

25 CGGTTATGAAGT ATTT GTTGGCAAATGGTTTCCTTCACGGAGATCGCATCACATGTACTGGTAAGACrGTAGCTGA 
AAACTTGGCTGACTTTGCAGACTTGACTCCAGGCCAAAAAGTTATCATGCCACTTGAAAATCCA 
TGGTCCGCTTATCATCTTGAACG GGAA CCTTGCTCCTGACGGTGCAGTTGCCAAGGTATCAGGTGTTAAACT 
CGTCACGTTGGGCCAGCTAAGGTCTTTGACTCAGAAGAAGATGCGATTCAGGCCGTTCTGACAGATGAAATCGTT 
GATGGCGATGTAGTCGTTGTTCGTTTTGTTGGACCTAAAGGTGGTCCTGGTATGCCTGAGATGCTATCACTr^ 

3U AATGATTGTTGGTAAAGGTCAGGGAGATAAGGTGGCCCTCTTGACGGACGGACGTTTCTCTGGTGGTACTTATGGT 
CTGGTTGTTGGACATATCGCTCCTGAAGCTCAGGATGGTGGACCAATTGCCTATCTCCGTACCGGCGATATCGTTA 
CGGTTGACCAAGATACCAAAGAAATTTCTATGGCCGTATCCGAAGAAGAACTTGAAAAACGCAAGGCAGAAACA 
ACCTTGCCACCACTTrACAGCCGTGGTGTCCTCGGTAAATATGCCCACATCGTATCATCTGCTTCACGCGGAGCCG 

^ TGACAGACTTCTGGAATATGGACAAGTCAGGTAAAAAATAA 

MTEI^KRHRSSIYDSMVKSPNRAMUIATGMTDKDFETSIVGVISTWAENTPCNIHLHD 

Tn^ADGIAMGTPGMRFSLTSRDIlADSIEAAiMSGHNVDAFVAIGGCDKNMPGSMIAlANMDIPAIFAYGGTIAPGNLDG 
KDIDLVSVFEGIGKWNHGDMTAEDVKRLECNACPGPGGCGGMYTAhrTMATAmVLGMSLPGSSSHPAESADKKEDIE 
AAGRAVVKMl^LGLKPSDaTREAreDArrVTMALGGSTNATLHUJVlAHAANVDLSLEDFhm 
4U YVFQDLYEVGGVPAVMKYLDVNGFLHGDRnCTGKTVAENL^ 

AVAKVSGVKVRRHVGPAKVFDSEEDAIQAVLTDEIVDGDVWVRFVGPKGGPGMPEMLSLSSMIVGKGQGDKVALLT 
DGRFSGGTYGLVVGHIAPEAQDGGPIAYLRTGDIVTVDQDTKBSMAVSEEELEKRKAETTLPPLYSRGVLGKYAHTVSS 
ASRGAVTDFWNMDKSGKKZ 

45 ID26 274bo 

ATGTTATAATAAAAATAAAGAATTTAAGGAGAAATACAATATGTCAATTITrATTGGAGGAGCATGGCCATATC 
AAACGGTTCGT TACAT ATTGGTCACGCGGCAGCGCTTTTACCGGGGGATATTCTTGCAAGATACTATCGTCAG^ 
GGGAGAGGAAGTTTTATATGTTTCTGGAAGTGATTGTAATGGAACCCCTATTTCTATCAGAGCTAAAA^ 
- J\) T AAGTCTGTG AAAG AAATTG CTG ATTTTT ATCAT AAGG AATTT A ATCCA 

CYNK^EFKEKYNMSinGGAWPYANGSLHIGHAAALLPGDILARYYRQKGEEVLYVSGSDCNGTPISIRAKKENKSVK 
EIADFYHKEFNP 

55 ID28 1065hn 

ATGACAACATTArmCAAAAATTAAAGAAGTAACAGAACTTGCTGCAGTCTCAGGTCATGAAGCGCCTGTCCCT 

GCTTATCTTCGTGAAAAGTTGACACCGCATGTGGATGAAGTGGTGACAGATGGCTTGGGTGGTATTTTTGCT 

AACATTCAGAAGCTGTGGATGCACCGCGCGTCTTGGTCGCTTCTCATATGGACGAAGTTGGTTTTATGGTCAGCGA 

OU AATCAAGCCAGATGGTACCTTCCGTGTCGTAGAAATCGGTGGCTGGAACCCCATGGTGGTTAGCAGCCAACGTTT 
CAAACTCTTGACTCGTGATGGTCATGAAATTCCT GTGATT TCAGGTTCTGTTCCTCCGCATTTGACTC 
GGGGGACCAACCATGCCAGCCATTGCCGATATCGTTTTTGATGGTGCITrrGCGGACAAGGCTGAGGCAGAAAGT 
TTTGGCATCCGTCCTGGTGATACCATTGTACCAGATAGTTCTGCAATTTTGACAGCCAATGAAAAAAATAT^ 
CAAAAGCTTGGGATAACCGCTACGGTGTCCTCATGGTAAGCGAGCTAGCTGAAGCTTTATCGGGTCAAAAACTCG 

05 GCAATGAACTCTATCTGGGTTCTAACGTCCAAGAAGAAGTTGGTCTGCCn'GGCGCTCATACCTCTACAACCAAGTT 



SUBSTITUTE SHEET (RULE 26) 



WO 00/06738 



PCT/GB99/02452 



10 



15 



40 



60 



47 

TGACCCAGAACTTCTTCCTCGCAGTTC^ 
TGGAACCrTGATTCGTTTCTATGATCCAGGTCACTTC 

GAAGAAGCTGGTATCAAGTACCAATACTACTGTGGTAAAGGCGGAACAGATGCAGGTGCAGCTCATCTGAAAAAT 
GGTGGTGTCCCATCAAC A ACTATCG GTGTCTG CGCTCGTT ATATCCATTCTCACCAAACCCTCTATGCA ATGG ATG 
ACTTCCTAGAAGCGCAAGCi 1 IL1 lACAAGCCTTGGTGAAGAAATTGGATCGTTCAACGGTTGATTTGATTAAACA 
TTATTAA 

MTTU^KKEVTELAAVSGHEAPVRAYLREKLTPHVDEVVTDGLGGIFGIKHSEAVDAPRVL^ 

DGTFRVVEIGGWNPMWSSQRFKLLTRTCHEIPVISGSVPPHLTRGKGGPTMPAlADIVFDGGFADKAEAESrc 
IVPDSSAILTANEKNIISKAWDNRYGVLMVSEIj^^ 

PAGDVYGG(^KIGDGTLIRFYDPGH1XLPGMKDFLLTTAEEAGIKYQYYCGKGGTDAGAAHUCNGGVP 
IHSHQTLYAMDDFLEAQAFLQALVKKLDRSTVDLIKHYZ 



ATGGAATTTTCTATGAAATCAGTCAAAGGACTACTCm 

GjV^CACTTCTCCCCAATTCATGATTCCAGGACTAGCTTTAACAAGCCTATCTCTGACTTTTATCCT 

ctcccactactag aaag ctggtttcacagtttggagaaggtctacacc^^ 
tcatcctactaatctttcataactttagtatgggcggtttgtggggctctcgcrtagct 

ZU GCCATCTATATCTTTGCCAGCATCATCCTTGTCGCCT 

TTCACCGCCTGGTTTACCTAGCCTATATTTTAGGACTCTTTCACA 
TTTAATCTTCTAAGTTTTCTTGTTGGTAGCTATGC^ 

CAAAAGATTTCCTTCC CCTA TCTAGGGAAAATTACCCATCTCAAACGCTTAAATCACGATACrAG 
ATCC ATCTT AGCA G AC CT TTCAACT ATC A-^TCAGG AG\ATTTG CCTTTCTAAAG ATTT^ 
25 GTGCTCCGCATCCCTTTTCTATCTCAGGAGGTCATGGTCAAACTCTITACrrrA 

TACCAAGAATATCTATGATAATCTTCAAGCCGGCAGCAAAGTAACCCTAGACAGAGCTTACGGACACATGATCAT 
AGAAGAA GGACG AGAAAATCAGGTTTGGATTGCTGGAGGTATTGGGATCACCCCCTTCATCTCTTACATCCGTGA 
ACATCCTATTTTAGATAAACAGGTTCACTTCTACTATAGCrrCCGTGGAGATGAAAATGCAOT 
CTCCGTAACTATGCTCAGAAAAATCCTAATTTTGAACTCCATCT 

TTGAACAAAAAGAAGTGCCCGAACATGCAACCGTCTATATGTGTGGTCCTATTTCTATGATGAAGGCA 
AACAGATTAAGAAACAAAATCCAAAAACAGAGCATATTTAC 

MEFSMKSVKGLLFIIASFILTLLTWMOTSPQFMIPG^ 

WSMGGLWGSRLAAQFGNLA1YIFASIILVAY1XKYTQYEAWRWIHRLVYLAYILGLFHIYMIMGNR1X 
J5 YAI^GLLAGFniFLYQKlSFPYLGOTHLKRLNHDTRE^ 

FTVKTSGDHTKNIYDNLQAGSKVTLDRAYGHMnEEGRENQVWIAGGIGITPHSYIREHPILDKQVHFYYSFRGDENAV 
YLDLUWYAQKNPNFEUiLrDSTKlXJYLNFEQKEVPEHATVYMCGPISMM 



ID32 900bp 



atg ac l 1 1 1 aaatcaggctttgtagccattttaggacgtcccaatgttgggaagtcaaccitm 
tggggcaaaagattgc catc atgagtgacaaggcgcagacaacgcgcaataaaatcatgckjaatitacacgactg 
ataaggagcaaattgtctttatcgacacaccagggattcacaagcctaaaacagctctcggagatttcatgg 
agtctgcctacagtacccttcgcgaagtggacactgttcttttcatggtgcctgctgatgaagcgcgtggtaaggg 
45 ggacgatatgattatcgagcgtctca-aggctgccaaggttcctgtga 

catccagaccagctcttgtctcagattgatgacttccgtaatcaaatggactttaaggaaattgttcc 
cccttcagggaaataacgtgtctcgtctagtggatattttgagtgaaaatctggatgaaggtrt 
gtctgatcaaatcag\gaccatccagaacgtttcttggtttcagaaatggttcgcgagaaagtcttgcacct 
cgtgaagagattccgcattctgtagcagtagttgttgactctatgaaacgagacgaagagacagacaaggttcac 

- 5U ATCCGTGCAACCATCATGGTCGAGCGCGATAGCCAAAAAGGGATTATCATCGGTAAAGGTGGCGCTATGCTTAAG 
AAAATCGGTAGCATGGCCCGTCGTGATATCGAACTCATGCTAGGAGACAAGGTCTTCCTAGAAACCTGGGTCAAG 
GTCAAGAAAAACTGGCGCGATAAAAAGCTAGATTTGGCTGACTTTGGCTATAATGAAAGAGAATACT^ 

MTFKSGFVAILGRPNVGKSTFLNHVMGQKIAIMSDKACTTC 
55 TLREVDTVLFMVPADEARGKGDDMIIERLKAAKVPVILVVNKID^^^ 

RLVDILSENUJEGFQYFPSDQrTDHPERFLVSE^REKVLHLTREEIPHSVAVVVDSMKRDEETDKVHIRATIMVERDSQ 
KGlIIGKGGAMUGGC^MARRDIElJ4LGDKVFLETWVKVKKNWRDKJaDLADK}YNER£YZ 



ID33 85Sbo 



CTGCi ici iGT'rriM'ACAGAAGGAGGACTTATGCCTGAATTACCTGAGGTTGAAACCGTITGTCGTGGCrTAGAAA 
AATTGATTATAGGAAAGAAGATTTCGAGTATAGAAATTCGCTACCCCAAGATGATTAAGACGGATTTGGAAGAGT 
TTCAAAGGGAATTGCCTAGTCAGATTATCGAGTCAATGGGACGTCGTGGAAAATATTTGCTTTTTTA 
CAAGGT CTTGATTTC C CATTT GCGGATGGAGGGCAAGTATTTTTACTArCCAGACCAAGGACC^ 
05 GCCCATGTriUCri'1'CATTTTGAAGATGGTGGCACGCTTGTTTATGAGGATGTTCGCAAGTTTGG 
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15 



tcttggtgcctgaccttttagacgtctactttatttctaaaaaatta 

tttacaggtctttcaatctgcccttgccaagtccaaaaagcctatcaaatcccatctcctagaccagacc^ 

GCTGGACTTGGCAATATCTATGTGGATGAGGTTCTCTGGCGAGCTCAGGT^ 

TGACAGCAGAAGAAGCGACTGCCATTCATGACCAGACCATTGCrGTTTTGGGCCAGGCT 

CCACCATTCGGACrTATACCAATGCCTTTGGGGAAGATGGAAGCAT 

CTGGTCAAGAATGTGTACGCTGTGGTACCATCATO 

AAACTGTCAAAGGAGGGACTGA 

MIXVFTCCGLMPELPEVETVCRGLEmiGmSSIEIRYPKMIlCTD 
RMEGKYFYYPDQGPERKHAHVFraFEDGGTLVYEDVRKFGTMEIXVPDlX^ 
K5KKPIKSHLLDQTLVAGLGNIYVDEVLWRAQVHPARPSQTLTA£EATAmDQmVLGQAV 
GSMQDFHQVYDKTGQECVRCGTIIEKIQLGGRGTHFCPNCQRRDZ 



TTGTCCAAACTGTCAAAGGAGGGACTGATGGGAAAAATCATCGGAATCACT 

ACTCrGACAAATTT TCTA AGACAGCAAGGCnTTCAAGTAGTGGATGCCGACGCAGTCGTCCACCAACT^ 

CCTGGTGGTCGTCTGTTTGAGG CTCTA GTACAGCACTTTGGGCAAGAAATCAT^ 

GCCCTCTCCTAGCTAGTCTCATCTTTTCAAATCCTGATGAACGAGA^ 
ZU CCGTGAGGAACTGGCTACTTTGAGAGAACAGTTGGCTCAGACAGAAGAGATTTTCTTCATGGATATTCCCCTA 

TTTGAGCAGGACTACAGCGATTGGTTTGCTGAGACTTGGTTGGTCTATGTGGACCGAGATGCCCAAGTGGAACGC 

TTAATGAAAAGGGACCAGTTGTCCAAAGATGAAGCTGAGTCTCCrrCTGGCAGCCCAGTGGCCmAGAAAAAAAG 

AAAGATTTGGCCAGCCAGGTrCTTGATAATAATGGCAATCAGAAC^ 
^ AGGGAGGTAGGCAAGATGACAGAGA7TAA 

MSKI^KEGLMGKnGITGGLASGKSTVTNFlJlQQGFQVVDADAW 

ASUFSNPDEREWSKQlQGElIREEl^TLREQLAQTEEn^DIPLLFEQDYSDW^ 

KDEAESRLAAQWPLEKKKDLASQVLDNNGNQNQLLNQVHILLEGGRQDDRDZ 

30 yogg 

TTGATAATAATGGCAATCAGAACCAGCITCTTAA^ 
GAGATT^ACTGGAAGGATAATCTGCGCATO 

TACCTTTTATGCCCATCTTCGTGGAAA ATCTA GGTGTAGGGAGTCAGCAAGTCGCrrTrrATGCAG 
33 TTCTGTCTCTGCTATTTCCGCGGCGCTCrrrTCTCCTATrrGGGGTATTCT^ 

TGATGATTCGGGCAGGTCTTGCTATGACTATCACTATGGGAGGCTTGGCCTTTC 
CTTTCTTCGTTTACTAAACGGTGTATTTGCAGGTTTTG^ 

AAGGAGAAATCAGGCTCTGCCTTAGGTACTTTGTCTACAGGCGTAGTTGCAGGTACTCTAACTGGT^ 
A A GTGGCTTTAT CGCA GAATTATTTGGCATTCGTACAG 1 1'1'lCl'l ACTGGTTGCTAGTTTTCTATTTTTAGCTGCTATTT 

4U TGACTATTTGCTTTATCAA GGAAG ATTTTCAACCAGTAGCCAAGGAAAAGGCTATTCCAACAAAGGAAT^ 

CTXAAAT ATCCCTATCTTTTG CTC AATCTCTTTTT AACC AGTTTTC3TC ATCCAATTTTC AGCTC AATCG ATTG 

GCCCT ATTTTGGCTCTTT ATGT ACGCG A CTT AGGG C AG AC AG AG AATCTTCTTTTTGTCTCTGGTTTG ATTGTGTCC 

AGTATGGGL i i 1 1 CC AG CA TG ATG AGTGCAGGAGTCATGGGCAAGCTAGGTG ACAAGGTGGGCAATC ATCGTCTC 

TTGGTTOTCGCCCAGTTTTATTCAGTCATCATCTATCTCCrCTGTGCCAAT^ 
40 CTATCGTTTCCTCTTTGGATTGGGAACCGGTGCCTTGATTCCCGG 

AAAGCCGGCATTTCGAGGGTCnTrGCCTTCAATCAGGTATTCTTTTATCTGGGAGGTGTTGT^ 

GTTCTGCAGTAGCAGGTCAATTTGGCTACCATGCTGTCTTTTATGCGACAAGCCTTTOT 

TTTAACCTGATTCAATTTCGAAC\TTATTAAAAGTAAAGGAAATCTAG 

" 50 WIMAmTSFLIKCtSFLREVGKMTEINWKD^^ 

AAlJSPIWGIIjVDKYGRKPMMIRAGl^MTrrMGGLAFVPNIYWLIFLRUNGVFAGFVPNATALlASQVPKE^ 
- Tl^GVVAGTLTGPFIGGFIAELFGIRTVFLLVGSFLFLAAlLTICFnCEDFQPVAKEKAIPTKELFTS 
FVIQFSAQSIGPIl^LYVRDLGQTENLLFVSGLrVSSMGFSSMMS 

SPLQLGLYRFLFGLGTGALIPGVNALLSKMTPKAGISRVFAFNQVFFYLGGVVGPMAGSAVAGQFGYHAVFYATSLCV 
55 AFSCLFNLIQFRTLLKVKEIZ 

ATGGCCCTACCAACTATTGCCATTGTAGGACGTCCCAATCrrTGGGAAATCAACCCTATTTAATCGGATCGCT 
AG CGAATC TCGVTTGTAGAAGATGTCGAAGGAGTGACACGTGACCGTATTTATGCAACGGGTGAGTGGCTCAATC 
OU GTTCTTTTAGCATGATTGATACAGGAGGAATT GATGA TGTCGATGCTCCTTTCATGGAACAAATCAAGCACCAGG^ 
AGAAATTGCCATGGAA GAAG CAGATGTTATCGTTTITGTCGTGTCTGGTAAGGAAGGAATTACTGATGCAGACGA 
ATACGTAGCTCGTAAGCTTTATAAGACCCACAAACCAGTTATCCTCGO^GTCAACAAGGTGGACAACCCTGAGAT 
GAGAAATGATATATATGATTTCTATGCTCrCGGTrTGGGTGAACCATTGCCTATCT 

ACACKjGGATGTGCTAGATGCGATCGTAGAAAATCTTCCAAATGAATATGAGGAAGAAAATCCAGATGTCATTAAG 
OD TTTAGCTTGATTGGTCGTCCTAACGTTGGAAAATCAAGCTTGATCAATGCTATCTTGGGAGAAGACCGTG 
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^II GATACGGCTGOTATGCGT ^ 

CGTGCTATTGACCGTrCAGATGTGGTOT^ 

ATCGCAGGATTTGCCCATGAAGCTGGTAAAGGGATGATTATCGTGGTCAACAAGTGGGATACGCTTGAAAAAGAT 

AACCACACTATGAAAAACTGGGAAGAAGATATCCGTGAGCAGTTCCAATACCTGCmACGCA^ 

GTATCAGCITTAACCAAGCAACGTCTCCAG\AACTTCCTGAGATGATTAAGCAAATCAGCGAAA 

CGTATTCCATCAGCTGTCTTGAACGATGTCATCATGGATGCCATC 

AAACGTCTCM^ATTTTCTATC 

TCATCTCATCGCA AG AAAACGCAAATAA 1 

f^UTIAIVGRPNVGKSTUT^RIAGERlSIVEDVEGVTO 
EEADVTVFVVSGKEGrTOAGEYVAKiaYl^ 

VENLPNEYEEENPDVIKFSLIGIU*WGKSSLINAILGEDRVlASPVAGTTRDAiDTHFTOTTC 
YE^KYSVMRAMittlDRSDWLMVINAEEGIREYD^ 

^Y^PYAPIIFVSALTKQRLHKLPEMIKQISESQNTRIPSAVLNDVIMDAIAIN 
NEEELMHFSYLRFLENQIRJCAFVFEGTPIHLIARKRKZ 

P>37 714hp 

ATGACAGAAACCATTAAATTGATGAAGGCTCATACTTCAGTGCGG\GGTTTAAAGAGCAAGAAATTC^ 

GACITAAATGAGATTTTGACAGCAGCCCAGATGGCATCATOTGGAAGAATTTCCAATCCT^ 

TACGAAGTCAAGAGAAGAAAGATGCCTTGTATGAATTGGTACCTCAAGAAGCCATTCGCCAGTCTGCTGTTTTCCT 

TCTCTTTGTCGGAGATTTGAACCGAGCAGAAAAGGGAGCCCGACTTCATACCGACACCTTCCAACCCCAAGGT 

GG AAGGTCTCTTGATTAGTTCGGTCGATGCAGCTCTTGCTGGACAAAACGCCTTGTTGGCAGCrGAAAG 

TATGGTGGTGTGATTATCGGTTTGGTTCGATACAAGTCTGAAGAAGTGGCAGAGCTCTTTAACCTACCTGACrACA 

CCTA7TCTGTCTTTGGGATGGCACTGGGTGTGCCAAATCAACATCATGATATGAAACCGAGACTGCCACTAGAGA 

ATGTTGTCITTGAGGAAGAATACCAAGAACAGTCAACTGAGGCAATCCAAGCTrATGACCGTGTTCAGGCTGACT 

ATGCTGGGGCGCGTGCGACGXCAAGCTGGAGTCAGCGCCTAGCAGAACAGTTTGGTCAAGCTGAACCAAGCTCAA 

CT AG AAAAAATCTTG A ACAG AAG AAATT ATTGTAG 

^ETIKI^MKAHTSVRRFKEQEIPQVDLNEILTAAQM 

GDLNRAEKGARLHTDTFQPQGVEGLUSSVDAAl^GQNALlJ^ES^^ 

MALGVPNQHHDMKPRLPLENVVFEEEYQEQSTEAIQAYDRVQADYAGARATO 

tP?g7?9bp 

ATGACAGAAATTAGACTAGAGCACGTCAGTTATGCCTATGGTCAGGAGACKjATTTTAGAGGATATCAACCT^ 

^tl^ZIT^^^^^^ A ^ GC CCAAOTGGTCTTXGO AAAG ACCAC CCXCTTT AmAXCT AAXCG CXG 

GGATTTTAGAAGTTCAGTCAGGGAGAATTGTCCTTGATGGTGAAGAAAATCCCAAGGGGCGCGTGAGTTATATGT 
TGCAAAAGGATCTGCTOTGGAGCACAAGACGGTGCrrGGAAATATCATrCTGCCCCTCTTG 
ATAAGGCAGAAGCTATTrCCCGAGCGGATAAAATrOTGCGACCTTCCAGCTGACAGCTGTAAGAGACAAGT^ 
CTCATGAACTTAGCGGTGGGATGCGCCAGCGTGTAGCCTTACTCCGGACCTACCTTTTTG 

CTrAGATGAGGCCmAGCGCCTTGGATGAGATGACAAAGATGGAACTCCACGCTTGGTATCTTGAGATTCA^ 
GCAGTTGCAGCTAACAACCCTGAT CATC ACGCATAGTATTGAGGAGGCCCTCAATCTCAGCGACCGTATCTATATC 
TTGAAAAATCGCCCTGGGCAGATTGTTTCAG AAATT AAACTAGATTGGTCTGAAGATGAGGACAAGGAA 
AAGATTGCCTACAAACGTCAAATTTTGGCGGAATTAGGCTTAGATAAGTAO 

MTEIRLEHVSYAYGQERIL£DINLQVTSGEVVSILGPSGVGKTTLFNUAGILEVQSGR^ 

LEHKTVLGNIILPLUQKVDKAEAISRADKJLATFQLTAVRDKYPHEl^GGMRQRVALUlTYLFGHKLi^ 

^MELHAWYLEIHKQLQLTrunWIEEALNL^DRIYIUCNRPGQIVSElKLDWS 

ATGAACTATTCAAAAGCATTGAATGAATGTATCGAAAGTGCCTACATGGTTGCTGGACATmGGAGCTCGTT 

TAGAGTCGTGGCAC TTGTT GATTGCCATGTCTAATCACAGTTATAGTGTAGCAGGGGCAACTTTAAATGATTATCC 

GTATGAGATGGACCGTTTAGAA^GGTGGCTTTGGAACT 

GGAATTGCCGTTCTCCCGTCGTTTGCAGG 1 1 U 1TIT 1 GATGAAGCAGAGTATGTAGCGTCAGTGGTCCATGCTAAG 
GTACT AGGGA CAGAGCACGTCCTCTATGCGATTTTGCATGATAGCAATGCCTTGGCGACTCGTATCTTGGAGAGG 
GCTGGI 1 i i i i ATG AAG ACAAGA AAGA TC AG GTC AAG ATTGCTGCTCTTCGTCGAAATTT AG AAG AACGGGCA 
GGCTGGACTCGTGAAGATCTCAAGGCTTTACGCCAACGCCATCGTACAGTAGCTGACAAGCAAAATTCTATGGCC 
AATATGATGGGCATGCCGCAGACTCCTAGTGGTGGTCTCGAGGATTATACGCATGATTTGACAGAGCAAGCGCGT 
TCTGGCAAGTTAGAACGVGTCATCGGTCGGGACAAGGAAATCTCACGTATGATTCAAATCTTGAGCCGGAAGACT 
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AAGAACAACCCTGTCTTGGTTGGGGATCCTGGTGTCGC^ 

CTAGTGGTGACGTGCCTGCGGAAATGGCTAAGATGCGCGTGTTAGAACITGATTTX3ATGAATGT 

CACGCTTCCGTGGTGACTTTGAAGAACGCATGAATAATATCATCAAGGATATTGAAGAAGATGGCCAAGTCATCC 

TCTITATCGATGAACTCCACACCATCATGGGTTCTGGTAG 

GAAACCAGCCTTGGCGCGTGGAACTTTGAGAACGGTTGGTGCCACTACTCAGGAAGAATATG^AAAACATATCGA 

AAAAGATGCGGCACTTTCTCGTCGTTTCGCTAAAGTGACGATTGAAGAACCAAOT 

TTTACAAGGTTTGAAGGCGACrTATGAGAAACATCACCGTGTACAA 

TAAGATGGCTCATCGTTATTTAACCAGTCGTCACTTGCG^GACTCTGCTATCGATCTCTTGGATGAG 

ACAGTGCAAAATAAGGCAAAGCATGTAAAAGCAGACGATTCAGATTTGAGTCCAGCTGACAAGGCCCTGATGGAT 

GGCAAGTGGAAACAGGCAGCCCAGCTAATCGCAAAAGAAGAGGAAGTACCTGTCrACAAAGACTTGGTGACAGA 

GTCTGATATTTTGACCACCTTGAGTCGCTTGTCAGGAATCCCAGTTG\AAAACTGACTCAAACGGATGCT 

TATTrAAATCTTGAAGCAGAACTCCATAAACGGGTTATCGGTCAAGATCAAGCTCrm 

TTCGCCGCAACCAGTCAGGGATTCGCAGTCATAAGCGTCCGATTGGTTCCTTTATGTTCCTAGGGCCT 

CGGGAAAACTrGAATTAGCCAAGGCTCTGGCAGAAGTTCrrrTTGACGACGAATCAGCCCT^ 

AGTGAGTATATtKjAGAAATTTGCAGCTAGTCGTCTCAACGGAGCTCCTCCAGGCTATGTAGGATATGAAGAAGGT 

GGGGAGTTGACAGAGAAGGTTCGCAATAAACCCTATTCCGTTCTCCTCTTrGATGAGGTAGAGAAGGCCCACC^ 

GATATCTTTAATGrrCTCTTGCAGCriTCT^ 

CAAATACCATT ATCAT TATGACATCGAATCTAGGTGCGACTGCCCTTCGTGATGATAAGACTGTTGGTTTT 

TAAGGATATTCGTTTTGACCAGGAAAATATGGAAAAACGCATGTTTGAAGAACTGAAAAAAGCTTAT^ 

ATTCATCAACCGTATTGATGAGAAGGTGGTCTTCCATAGCCTATCTAGTGATCATATGCAGGAAGTGGTGAAGATT 

ATGGTCAAGCCTTTAGTGGCAAGTTTGACTGAAAAAGGCATTGACTTGAAATTACAAGCTTCAGCT 

TAGCAAATCAAGGATATGACCCAGAGATGGGAGCTCGCCCACTTCGCAGAACCCTGCAAACAGAAGTGGAGGAC 

AAGTTGGCAGAAL ri L ri CTCAAGGGAGATTTAGTGGCAGGCAGCACACTTAAGATTGGTGTCAAAGCAGGCCAG 

TTAAAATTTGATATTGCATAA 

MWSKALNECIESAYMVAGHFGARYL£SWHLLIAMSNHSYSVAG 

LPFSRRLQVLFDEAEYVASVVHAKVLGTEHVLYAILHDSNALATR^^ 

EDUCALRQRHRTVADKQNSMANMMGMPQTPSGGLEDYTHDLTEQARSGKl^VIGRDK^ 

GDAGVGKTALALGLAQRL^GDVPAEMAKMRVLELDLMNVVAGTRFRGDFEERMNNIIKDIEETC 

CSGSGIDSTLDAANILKPALARGTLRTVGATTQEEYQKHIEKDA^ 

QrTDEAVETAVKMAHRYLTSRHLPDSAIDLLDEAAATVQNKAKHVKADDSDLSPADKALMDGKWKQAAQ 
PVYKDLVTESDU.TTURJ^GIPVQKLTQTDAKKYLNLEAELHKRVIGQDQAVSSISRAIRRNQSGIR^ 
TGVGKTEUJCALAEVLFDDESAURnDMSEYMEKFAASRLNGAPPGYVGYEEGGELTEKVRNKPYSVIXro 
DIFWLLQ\rt-DDGVLTDSKGRKVDFSNTniNrrSNLGAT^ 

RIDEKVVFHSI^SDHMQEVVKIMVKPLVASLT^GIDLKLQASALKIXANQGYDPEMGARFLRRTUJT^ 
UCGDLVAGSTLKIGVKAGQLKFDIAZ 

1P40 1008bo 

ATGAAGAAAACATGGAAAGTGTTTTTAACGCTrGTAACAGCrCTTGTAGCT 

GAACTGCTTCT AAAG ACAACAAAGAGGCAGAACTTAAGAAGGTTGACTTTATCCTAGACTGGACACCAAATA 

ACCACACAGGGCTTTATGTTGCCAAGGAAAAAGGTTATTTCAAAGAAGCT 

CACCAGAAGAAAGTTCTTCTGACTTGGTTATCAACGGAAAGGCACCATTTGCAGTGT^ 

TAAGAAATTGGAAAAAGGAGCAGGAATCACTGCCGTTGCAGCTATTGTTGAACACAATACATCAGGAATCATCTC 

TCGTAAATCTGATAATGTAAGCAGTCCAAAAGACTTGGTTGGTAAGAAATATGGGACATGGAATGACCCAACTGA 

ACTTGCTATGTTGAAAACCTTGGTAGAATCTCAAGGTGGAGACTTTGAGAAGGTTGAAAAAGTACCAAATAA 

CTCAAACTCAATCACACCGATTGCCAATGGCGTCTTTGATACTGCTTGGATTTACTACGGTTGGGA 

GCTAAATCTCAAGGTGTAGATGCTAACTTCATGTACTTGAAAGACTATGTCAAGGAGTTTGACT^ 

TTATCATCGCAAAG^ACGACTATCTGAAAGATAACAAAGAAGAAGCTCGCAAAGTCATCCAAGCCATCAAAAAA 

GGCTACCAATATGCCATGGAACATCCAGAAGAAGCTGCAGATATTCTCATG\AGAATGCACCTGAACTCAAGGAA 

AAACGTGACTTTGTCATCGAATCTCAAAAATACTT 

TTTGACGCAGCTCGCrGGAATGCTTTCTACAAATGGGATAAAGAAAATGGTATCCTTAAAGAAGACTTGACAGAC 
AAAGGCTTCACCAACGAATTTGTGAAATAA 

MKJCTWKVFLTLVTALVAVVLVACGQGTASKDNKEAELKKVDFIL£ 

PEESSSDLVINGKAPFAVYFQDYMAKKLEKGAGn*AVAAIVEHhrrSGIISRKSDNVSSPKDLVGKKYGTWNDP^ 
KTLVESQGGDFEKVEKVPNhTO$NSrn»lANGVFDTAWIYYGWDGIU 

YUCDNKE£ARKV1QAIKKGYQYAMEHPEEAADILIKNAPELKEKRDFVTESQKY1^KEYA5DKEKWGQFDAARWNAFY 
KWDKENG1LKEDLTDKGFTNEFVKZ 

ID41 762bo 

TTGATGAGAA ACTT GAGAAGTATACTGAGACGACACATTAGTCTATTGGGCTTTCTCGGAGTATTGTCAATCT 
AGTTAGCAGGTTTTCTTAAACTTCTCCCCAAGTTTAT 

GACAGAGAATTTCTCTGGCACCATAGCTGGGCGACCTTGAGACTrGGCmACTGGGGCTGATm 
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10 



15 



20 



TTGCCrGTCTTATGGCTGTGCTCATGGATAGTTTGACTTGGCTCAATGACCTGATTTACCCTA 
CAGACCATTCCGACCATTGCCATAGCTCCTATCCTGGTCTTGTGGCrAGGTTATGGGATTTTGCCCAAGATTGTCT 
TGATTA TCTT AACGACAACCITrCCCATCATCGTTAGTATTTTGGACGGTTTTAGGCATTGC^ 
GACLi iui ri AGTCTGATGCGGGCCAAGCCTTGGCAAATCCTGTGGCATTTTAAAATCCCAGTTAGCCTGCCTTAC 
TTTTATGCAGGTCTGAGGGTCAGTGTCTCCTACGCCTITATCAG\ACTGTGGTATCTGAGTGGTTGGGAG 
AAGGTCTTGGTG TTTAT ATGATTCAGTCTAAAAAACTGTTTCAGTATC 
TCGATTATCAGTCTTTTGGGTATGAAGCTGGTCGATATCAGTGAAAAATATCT 

MMRNLRSILRRHISLLGFLGVUrWQLAGFLm.PKF!Lm 
AVt^DSLTWU^DlJYPMMVVIQTIPTIAlAPILVLWLGYGlLPKW 
WQU.WHFKIPVSLPYFYAGUlVSVSYAfTrmv^^ 
K YV DC^VICRSZ 

TTGATTTTTAAT CCTATT TGCTGTATGATAAGGGAAAAGAAAGGGGACAGAGATATGGCTTTTACCAA 
TGCGATCTGCTAGTTTTGGTATTGTTACCAGCTTGCCTGATGACATCATTGACTCTTTTT 
T TCTTAA AAAATGTCTTTGAATTGGAAGAAGAACrCGAGTTTCAATTGCTTAATAACCAAGG 
ACITrrC AAGT CAACACCTCCCTACAGCCArrGATm 

GTACTGGTTTTAGACATGGACGGTAGAGAAACTATCCTCCTCCCAGAAGAAAATGACCTATTTTAA 

MIFNPICCMIREKKGDRDMAfThTTHMRSASrcrVTSLPDDIIDSFWYIIDHFl^^ 
HLPTAIDFDFNHPFDPRYPPRVLVLDMDGRETILLPEENDLFZ 

25 ID43 IS69bo 

ACAGCGGT GTCA TTCTATCTATITrAAGAAAAGTAATAATCAATTGTTAAAAATAGTAAAAAAA 
ATGAA ATATT TTGTTCCTAATGAG GTAT TCAGTArrCGTAAATTAAAGGTGGGGACTTGCTCGGTACTA 
TTTCAATTTTGGGAAGCCAAGGTATTTTATCGGATGAAGTTXrrTACTAGTTCTTCACCGATGGCT 
iU TTCTAATGCAATTACTAATGATTTAGATAATTCACCAACTGTTAATCAGAATCGTTCT^ 

aattcaaccactaatggtttagataattcgttaagtgttaatagcatcagctctaatggtactattcgttccaa 
cac^attagacaacagaacagttgaatctacagtaacatctactaatgaAaataagagttataaggaagatgtta 
taagtgacagaattatcaaaaaagaatttgaagatactgctttaagtgtaaaagattatggtgcagtaggtgatg 
g gatt catgatgatcgacaagcaattcaagatgcaatagatgctgcagctcaagggctaggtggaggaaatgtat 
3j attttcctgaaggaacttatttagtaaaagaaattgttttt^ 

agctacaattctaaatggtataaatattaagaatcacccttccattgtttttatgacaggttt^ 
ggtgcgcaagtagaatggggcccaacagaagatattagttattctgg 
aatgaagaaggaactaaagcaaaaaatctaccacttataaattcttcaggtgcatttc 
aacgtaactataaaaaatg taac attcaaggatagttatcaagggcatgcrattcaaattgcaggttcgaaaaat 

4U GTATTACTITGATAATTCTCGTTTTCnTGGGCAAGCCTTACCCAAAACGATGAAGGATG 
AGAGCATTCAGATTGAACCATTAACTAGAAAAGGTTTTCCTTATG 

ATGTGACTATTCAAAATTCCTATTTTGGCAAAAGTGATAAATCTGGGGAATTAGTAACAGCAATTGGCACACACT 
TCAAA CATT GTCGACACAGAACCCCTCTAATATTAAAATTCAAAATAATCATTTTGATAACATGATGTATGCAGGT 
GTACGTTTTACAGGATTCACTGATGTATTAATCAAAGGAAA 
4^ CATTATCGAGAAAGCGGAGCAGCTTTAGTAAATGCTTATAGCTATAAAAACACT 

AAACAGGTGGTTATCGCCGAAAATATATrTAATATTGCCGATCCTAAAACAAAAGCGATACGAGTTGCAAAAGAT 
AGTGCAGAATGTTTAGGAAAAGTATCAGATATTACTGTAACAAAAAATGTAATTAATAATAATTCTAAGGAAACA 
GAACAACCAAATATTGAATTATTACGAGTTAGTGATAATTTAGTAGTCTCAGAGAATAGT 

- 50 QRCHSrYTKKSNNQLUCrVKKLEVLMKYFVPNEVF^ 

DLDNSPTVNQ^AEMIASNSTTNGLDNSLSVNSISSN^^ 

Al^VKDYGAVGDGIHDDRQAIQDAIDAAAQGLGGGNVYFPEGTYLVKEIVFUCSHTHLELNEKATILNGINlKNHPSIVF 
MTGUTDDGAQVEWGPTEDISYSGGTIDMNGALNEEGTKAi^ 
GSKNVLVDNSRFLGQAU»KTMKDGQIISKESIQIEPLT^ 

Cm^QNPSNIKIQNNHFDNMMYAGVRFTGFTDVUKGNRFDKKVKGESVHYRESGAALVNAYSYK^DLLDLNKQ 
WUENIFNIADPKTKAIRVAKDSAECLGKVSDITVTKNVINNNSKETEQPNIELLRVSDNLVVSENS 

ID44 324bp 

60 GTGATGAAAGAAACTCAGCTA TTAAA AGGTGTTCTTGAAGGTTCT 

TATGGTTATGAGTTGGTTCAGACTTTGCGAGAGGCTCKjATTTGATACTATCGTTCCAGGAACTATT^ 
GCAAAAGTTAGAAAAAAATCAATGGATAAGAGGCGACATGCGCCCGTCGCCAGATGGTCCAGATCGGAAGTATTT 
TTCATTAATGAAAGAAGGAGAAGAGCGTGTCTCAGTCnTTTGGCAACAATGGGACGATTTGAGTCAAAAAGTAGA 
AGGGATTAAGAATGGGGGTTAA 

65 



55 
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MMKETOLUCGVLEGCVLDMIGQKERYGYELVQTLREAGFDTIVPGTIYPLLQKLHKNQWIRGDMRPSPDGPDRKYFSL 
MKEGEERVSVFWQQWDDLSQKVEGJKNGGZ 

5 IP4S 816bp 

ATGAAGAAAATGAAGTATTACGAAGAAACAAGCGCTTTGCTACATGAGTTTTCTG 
GAGGAGTTGTGGGAAAGTTTTAATCTTGCTGGATTTCTCTATGATGAAGACTATCTCAGAGAGCAGATCT 
TGATGCTAGATTTCTGVGAAGCAGAACGAGATGGCATGAGTGCAGAGGATTATCTAGGTAAGAATCCTAAAAAAA 
TAATGAAAGAGATTCTCAAGGGAGCACCTCCKlACnTCTATCAAAGAGTCCCTTITGACGCCAA 

iU GGTATTACGTTA TTATC AACTACTAAGTGATTrTTCTAAAGGTCCTCTCTTAACAGTCAATT^ 
GGCAACTTCTTATTTTTCTGATTGGATTTGGACrTGTGGCCACAATTTTACGA^ 
AAAATGAAAATTGGCACTT ACATT GTTGTTGGGACTATAGTTCTTCTAGTTGTTTTAGGATATGTA^ 
GCTTCATACAAGAAGGAGCCTTTTATATTCCGGCTCCCTGGGaTAGTTTGTCTGTC^ 
GGTATTTGGAATTGGAAA.GAAGCGGTCTTTCGTCCATTTGTCAGTATGATTATTGCCCATCTTGTGGTGG 

ID GCTCCGTTATTATGAGTGGATGGGAATTTCAAATCrriTTCC 

GAATCnTGTCTTGTTCCGTGGGTTTAAGAAGATAAAATGGAGTGAAGTATAG 

MK3CMKYYEmAliHEFSEENQICYFEELWESFNUGaYDEDYLREQIYlJ4MLDF 
KEIUCGAPRSSIKESUTPILVl^V^YYQLl^DFSKGPLL™ 
ZU VVCrrrVLLVVLGYVGMASFIQEGAFYTPAPWDSI^VFTTSLVIGrWNWKEAVFRPFVSM 
VFLTKVIPLAVLHGIFVLFRGFKKIKWSEV2 

IP46 348bp 

25 CToi 1 1 1 1 1 i ATTTATACTCAATGAAAATCAAAGAGCAAACTAGGAAGCTAGCCGCAGGTTGCTCAAAACACTGTT 

TTGAGGTTGTAGACGAAACTGACGAAGTCAGCTCAAAACATGTTTTTGAGGTTGTAGATGAAACTGACGAAGTCA 
GCTCAAAACACTGTTTTGAGGTTGTA GATGAA ACTGACGAAGTCAGCTCAAAACACTGTT^ 
AAACTGACGAAGTCAGCTCAAAACATGTrmGAGGTTGTAGATGAAACTGACGAAGTCAGTAACCATACATACG 
GTAGGGCGACGCTGACGTGGTTTGAAGAGATTTTCGAAGAGTATTAA 



30 



35 



iO 



MFFYLYSMKKEQTRKLAAGCSKHCFEVVDETDEVSSKHVFEWDETDEVSSK^ 
EVSSKHVFEWDETDEVSNHTYGRATLTWFEEIFEEYZ 

1D47 1260bD 



ATGCAGAATCTGAAATTTGCCTTTTCATCTATCATGGCTCACAAGATGCGTTCr^ 

TATCGGTGTTTCATCAGTTGTT GTGATTA TGCCTTTGGGTGATTC^ 

AATCTCAGAAAAATATTAGCGTCTTITTCT 

TTTTACGGTTTCTGGAAAGGAAGAGGAAGTTCCTGTTGAACCGCCAAAACCGC^ 

4U AGCTAAACTGAAGGGAGTGGATAGTTACTATGTAACCAATTCAACGAATGCCATCTTGACCTATCAAGATAAAA^ 
GGTTGAGAATGCTAATTTGACAGGTGGAAACAGAACTTACATGGACGCTGTTAAGAATGAAATTATTGC^ 
TACrCTGAGAGAGCAAGATTTCAAAGAGTTTGCAAGTGTCATTTTGCTAGATGAGGAATTGTCC^ 
GAATCrCCTCAAGAGGCrATTAACAAGGTTGTAGAAGTCAATGGATTTAGTTACCGGGTCATTGGGO 
GTCCGGAGGCTAAAAGATCAAAAATATATGGGTTTGGTGGCTTGCCTATTACTACCAATATCTCCCTTGCTGCGAA 
TTTTAATGTAGATGAAATAGCTAATATTGTCTTTCGAGTGAATGATACCAGTTTAACCCCAACTCTGGGTCC^ 
CTt3GCACGAAAAATGACAGAG( nTGCA GGCTTACAACAGGGAGAATACCAGGTGGCAGATGAGTCCGTTGTATT^ 
GCAGAAATTCAACAATCGTTTAGTTTTATGACGACGATTATTAGTTCCATCGCAGGGATTTCTCT 
GAACTGGTGTCATGAAC ATCA TGCTGGTTTCGGTGACAGAGCGCACTCGTGAGATTGGTCTTCGTAAGGCTTTG 
TGCAACACGTGCCAATATTTTAATTCAGTTTTTGATTGAATCCATGATTTTGAC 

OU TGACAATTGCAAGTGGTTTAACTGCCTTAGCAGGTTO 

ATCAATCCCAGTCGCCCTATTTAGTCTTGCAGTTTCGGCTAGTGTrGGTATGATTTTTG 
AAGGCATCGAAACTTGATCCAATTGAAGCCCTTCGTTATGAATGA 

MQNLKFAFSSIMAHKMRSLLTMIGinGVSSVVVIMALGDSL^RQVNKD 
DD GKEEEVPVEPPKPQESWVQEAJ\KLKGVDSYYVTNSTNA1LTYQDKKVENANLTGGNRTYMD 
KEFASV&l^EELSISLFESPQEAINKVVEVNGFSYRVIGVYTSPEAKRSKrYGFGG^^ 
DTSLTPTLGPEI^RKMTEl^GLQQGEYQVADESVVFAEI^ 
LRIULGATRANILIQFUESMILTUGGUGLTUSGLTAI^ 
ASKLDPIEALRYEZ 



CTGATGAAGCAACTAATTAGTCTAAAAAATATCTTCAGAAGTTACCGTAATGGTGACCAAGAACTGCAG 
AAAAATATCAATCTAGAAGTGAATGAGGGTGAATTTGTAGCCATCATGGGACCATCTGGGTCTGGTAAGTCCACT 
O CTGATGAATACGATTGGCATGTTGGATACACCAACCAGTGGAGAATATTATCTTGAAGGTCAAGAAGTGGCTGGG 
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CTTGGTGAAAAACAACTAGCTAAGGTCCGTAACCAACAAATCGGTTTTGT 

AGCrCAATGCTCTGCAAAATGTAGAATTGCCCTTGAmACGCAGGAGTTTCGTCTTCAAAA 

TGAGGAATATTTAGACA^GGTTGAATTGACAGAACGTAGTC^ 

GCAACGTGTAGCCATTGCGCGTGCCTTGGTAAACAATC 

GATACCAAAACAGGTAACCAAATTATGGAATTATTGGTTGATTTGA^ 

ACGCATGAGCCTGAGATTGCTGCCTATGCCAAACGTGVGATTGTCATTCGGGATGGGGTCATTTCGTCTGACAGTG 
CTCAGTTAGGAAAGGAGGAAAACTAA 

MMKQUSLKNIFItfYRNGDQELQVLKNINLEVNEGEFVAIM^ 

QLAXVRNQQIGFVFWFFLl^KLNALQNVELPLIYAGVS5SKRRKI^ 

LVNNPSniJU)EPTGALDTKTGNQIMQLLVDLNKEGKTIIMVTHEPElAAYAKRQrVIRM 

ID49 1200bp 

ATGAAGAAAAAG AATGG TA AAGC TAAAAAGTGGCAACTGTATGCAGCA^TCGGTGCTGCGAGTGTAGTTGTATTG 
GGTGCTGGGGGGATTTTACTCTTTAGACAACCTTCTCAGACTGCT 
CCAAGGAAGGAj\GCGTGGCCTCCTCTCrrmATTGTCAGGGACAGT.\ACAGCAAAj^ 
TTGATGCTAGTAj\GGCTrGATTTAGATGAAATCCTTGTTTCT 

CAAGTACAGTAGTrCAGAAGCGCAGGCGGCCTATGAnCAGCTACTCGAGCAGTAGCTAGGGCAGATCGTCATAT 

CAATGAACTCAATCAAGCACGAAATGAAGCCGCTTCAGCTCCGGCTCCACAGTTACCAGCGCG\GTAGGAGGAGA 

AGATGCAACGGTGCAAAGCCCAACTCCAGTGGCTGGAAATTCTGTTGCTTCTATTGACGCTCAAT^^ 

CGTGATGCGCGTGCAGATGCTGCGGCGCAATTAAGCAAGGCTCAAAGTCAv\TTGGATGCAACAACTGTTCTCAGT 

ACCCTAGAGGGAACTGTGGTCGAAGTCAATAGCAATGTTTCTAAATCTCCAACAGGGGCGAGTCAAGTTATGGTT 

CATATTGTCA GCAAT GAAAATTTACAAGTCAAGGGAGAATTGTCTGAGTACAATCTAGCCAACCTTTCTGTAGGTC 

AAGAAGTAAGCTTTACTTCTAAAGTGTATCCTGATAAAAAATGGACTGGGAAATTAAGCTAT 

TAAAAACAATGGTGAAGCAGCTAGTCCAGCAGCCGGGAATAATACAGGTTCTAAATACCCTTATACTATTGATGT 

G AC AGG CG AGGTTGGTG ATTTG AAAC A AGGTTTTTCTGTCAAC ATTG AGGTT AAA AG CAAAACT AAGG CTATTCTT 

GTTCCTGTTAGCAGTCTAGTAATGGATGATAGTAAAAATTATGTCTGGATTGTGGATGAACAACAAAAGGCTAAA 

AAAGTTGAGGTTTCATTC^GAAATGCTGACGCAGAAAATCAAGAAATCACIT 

GTCATCAGTAATCCAACATCTTCCTTGGAAGAAGGAAAAGAGGTGAAGGCT 

MKKKNGKAKKWQLYAA1GAASVVYLGAGGILLFRQPSQTALKDEPTHLW 

ASKGDLDEILVSVGDKVSEGQALVKYSSSEAQAAYDSASRAVARADRHINELNQARNEAASAPAPQLPAPVGGEDATV 

QSPTPVAGNSVA5IDAQLGDARDAJIADAAAQLSKAQSQLDATTVLST1^GTWEVNSNVSKSPTGASQVM 

LQVKGELSEYNLANl^VGQEVSFTSKVYPDKKWTGKl^YISDYPKNNGEAASPAAGNm-GSKYPrnDVTGEVGDUCQ 

GFSVNIEVKSKTKAJLVPVSSLVMDDSKNYVWtVDEQQKAXKVEVSLGNADA£NQETTSGLTNGAKVISNmSLEEGKE 

VKADEATNZ 

IPSO 7S9bo 

ATGTCACGTAAACCATTTATCGCTGGTAACTGGAAAATGAACAAAAATCCAGAAGAAGCrAAAGCATO 

GCAGTTGCATCAAAACTTCCTTCATCAGATCTTGTO 

TTCTTGCTGTTGCAAAAGGCTrAAACCTTAAAGTTGCTGCTCAAA 

TGCrrGAAACTAGCCCACAAGTTTTGAAAGAAATCGGTACTGACTACGTTGTTATCGGTCACTCAGAAC^ 

CTACTTCCATGAAACTGATGAAGATATCAACAAAAAAGCAAAAGCAATCTTTGCGAACGGTATGCT^ 

CTGTTGTGGTGAATCACTTGAAACTTACGAAGCTGGTAAAGCTGCTGA 

TTGGCTGGATTGACTGCTGAACAAGTTGCTGCCTCAGTTATCGCTrATGAGCCAATCTGGGCTATCGGTACT 

AATCAGCTTCACAAGACGATGCACAAAAAATGTGTAAAGTTGTTCGTGACGT^ 

AAGTCGCAGACAAAGTTCGTGTTCAATACGGTGGTTCTGTTAj^ 

AGACGTTGACGGTGCCCTTGTAGGTGGTGCGTCACTTGAAGCTGAAAGCTTCTTGGaTTGCTTG 
TAA 

MSIUCPHAGWKMWNPEEAKAPVEAVASKLPSSDLVEA 
ETSPQVLKEIGTDYVVIGHSERRDYFHETDEDINKKAJCAIFANGM 

EQVAASVIAYEPIWAIGTGKSASQDDAQKMCKVVRDVVAADFGQEVADKVRVQYGGSVKPENVASYMACPDVDGAL 
VGGASLEAESFLALLDFVKZ 

ID51 I473bo 

TTGAAAACAAAAATTGGATTAGCAAGTATCTGTTTACTAGGCTTGGCAACTAGTCATGTCGCT 
AAGTAGCAAAAACTTCGCAGGATACAACGACAGCTTCAAGTAGTTCAGAGC^ 

AAACGAGCGCAGAAGTA CAGACT AATGCTGCTGCCCACTGGGATGGGGATTATTATGTAAAGGATGATGGTTCTA 
AAGCTCAAAGTGAATGGATTTTTGACAACTACrATAAGGCTTGGTTTTATATTAj\TTC^ 

GAATGAATGGCA TGGAA ATTACTACCTGAAATCAGGTGGATATATGGCCCAAAACGAGTGGATCTATGACAGTAA 
TTACAAGAGTTGGTTTTATCTCAAGTCAGATGGGGCTTATGCTCATCAAGAATGGCA^TTGATTGGAAATAAGTGG 
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TACTACTTCAAGAAGTGGGGTTACATGGCTAAAAGCCAATGGCAAGGAAGTTATTTCTTGAATGOT 
ATGATGCAAAATGAATGGCTCTATGATCCAGCCTATTCTGCTrATTTTTATCTAAAATCCGAT^ 
ACCAAGAGTGGCAAAAAGTGGGCGGCAAATGGTACTATTTCAAGAAGTGGGGCTATATGGCTCGGAATGAGTGGC 
AACWCAACTACTATTTGACTGGAACrrGGTGCCATGGCGACTC^ 

TGCGGCCTCTGGTGAGCTCAAAGAAAAAAAAGATTTGMTGTCGGCTGGGTTCACA 

CTTTAATAATAGAGAAGAACAAGTGGGAACCGAACATGCTAAGAA^ 

TATCAATGATTGGAAAAAGGTTATTGATGAGAACGAAGTGGA 

AGAAGACAAGGAArrGGCGCATAACATTAAGGAGTTAAACCCrrCTGGGAATTCCTTATGGTOT 

CTATGCTGAAAATGAGACCGTGCTGAGAGTGACGCTA^CAGACCATTGAACTTATAAAGAAATACAATATGAAC 

CTGTCTTACCCTATCTATTATGATGTTGAGAATTGGGAATATGTAAA 

GGCACTTGGCTrAAAATCAT 

AGCTAT CGTA GTTTATTACAGACGCGTTTAAAACACCCAGATATTT^ 

CGAATGCTTTAGAATGGGAAAACCCTCATTATTCAGGAAAAAAAGGTTGGCAATATACCTCTTCTGA^ 
AAGGAATCCAAGGGCGCGTAGATGTCAGCGTTTGGTATTAA 

MKTKIGLASICIXGI^TSHVAANETEVAJCTCQDTIT 

SEWTFDWYKAWFYINSIX}RYSQ^WHGNYYLKSGGYMAQNEWIYDSNYKSWFYI^ 

FKKWGYMAKSQWQGSYFLNGOGAMMQNEWLYDPAYSAYnXKSIX^ANQEWQKVGCKWYYFKKWGYMA^ 

WQGNYYLTGSGAMATDEVIMIXjTRYIFAASGEUCEKKDLNVGWVHRDGKRYFFNNREEQVGTEHAXKVIDK 

RINDWKKVTDENEVDGVTVTUXjYSGKEDKEI^HNIKELNRLGIPYGVYLYTYAENET 

PrYYDVEWEYVNKSKRAPSDTGTWVKJINKYMD^ 

EWENPHYSGKKGWQYTSSEYMKGIQGRVDVSVWYZ 

rDS2 774bo 

ATG AAAA AATTTGCCAACCTTTATCTGGGACTGGTCTTTCT^ 
TGCCTTTAATGCTGGT GATGA TATGAATAGCTTTACAGCrrTTTAGCTGGACTC^ 

GGGAG ACTCATGCTGATTTTGGCTCAGACA'riUM'rCIM'GGCCTTCCTATG\GCCTTGATAGCGACCATTATCGGGA 

CTTTTGGTGCCATTTACATCTACCAGTCTCGTAAGAAATACCAAG 

GGTTGCGCCTGACGTTATGATTGGTGCTAGCTTCTTGATTCTCTT^ 

CCGTTCTATCT'AGTCACGTGGCCTTCTCCATTCCTATCGTGGTCnTGATGGTCTTGCCTCGACT 

TGGCGACATGATrCATGCGGCCTATGAaTGGGAGCTAGTCAATTTCAGATCn^ 

CTGACTCCGTCT ATCATT ACTGGTTATITCATGGCCrrCACCTATTCCriTAGATGACTTTGCCGTGAC 

AACAGGAAATGGCTTTTCAACCCTATCAGTCGAGATTTACT 

GCCCTGTCTGCTCTAGTCTTTCTCTTTAGTATTATCCTAGTTGTAGGTT ATTACH 

GCAAGCATGA 

MKKFANLYLGLVFLVLYU>IFYUGYAfNAGDDMNSFTC 
IYIYQSRKKYQEAFI^LNNIUIVAPDVMIGASFLILFTQLKFSLGFL™^ 
DLGASQFQMFKEIMLPYLTPSinxrYFMAJTYSLDDFAVTFFVTGNGFS^ 
G YYF1S R£KEEQ AZ 

IDS9 1071bo 

ATGAAAAAAATCTATTCATTTTTAGCAGGA^ 

ATAGTAAAATCAATAGTCGAGATAGTCAAAAATTGGTTATCTATAACTGGGGAGACTATATCGATCCTGAACTCTT 

GACTCAGTTTACAGAAGAAACAGGAATTCAAGTTCAGTACGAGACTTTrGACTCCAACGAAGCCATGTACACTAA 

GATAAAGCAGGGTGGAACGACCTACGATATTGCCATTCCAAGTGAATACATGATTAACAAGATGAAGGACGAAG 

ACCTCTTGGTTCCGCTTGATTATTCAAAAATTG 

TGACCCAGGTAATAAATTCTCCAT CCCTT ACITCTGGGG 

GAAGCGCCTGAGCATTGGGATGACCTTTGGAAGCCGGAGTATAAGAATTCTATCATGCrCTTTGATGGCW 

GAGGTGCTGGGACTAGGACTCAATTCCCTCGGCTACAGCCTCAACTCCAAGGATCTGCAGCAGTTGGAAGAGACA 

GTGGATAAGCTCTACAAACTGACTCCAAATATCAAGGCTATCGTTGCGGACGAGATGAAGGGCTATATGATTCAG 

AATAATGTTGCAATCGGCGTGACC TTCTCTGGTG AAGCCAGCCAAATGTTAGA 

GTGGTACC GACA GAGG CCAG CAATCTTTGGTTTGAGAATATGGTCATTCC^ 

GCCTATGCaTTATCAACmATGTTGAAACCTGAAAATGCTCTCCAAAATGCGGAGTATGTCG 

CAj\ACCTACCAGCGAAGGAATTGCTCCCAGAGGAj\ACAAAGGAAGATAAGGCCTTCTATCCCGATGTTGAAACC^ 

TGAAACACCTAGAAGTTTATGAGAAATTTGACCATAAATGGACAGGGAAATATAGCGACCTCTTCCTACAGTTTA 

AAATGTATCGGAAGTAG 

MKKIYSFl^GlAAJILVLWGIATHLDSKINSRDSQKLVIYNWGDYIDPELi-TQFTEETGIQVQY 
TTYDIAIPSEYMINKMKDEDU-VPLDYSKIEGIENIGPEFLNQSFDPGNKFSIPYTO 

KPEYOSIMLFDGAREVLGLGLNSLGYSLNSKDLQQLEETVDKLYKLTPNIKAIVADEMKGYMIQNNVAIGVTO 

QMLEKNENLRYVVPTEA5NLWTONMVIPICTVKNQNSAYAFINFMLKPENALQNA£YVGYSTPNLPAKEUJ»EET^ 

KAFYPDVETMKHLEVYEKFDHKWTGKYSDLFLQFKiMYRKZ 
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ATGAATAAAAAA CTAAC AGATTATGTGATTGATCTGCrrGGAAATTTTAAATAAACAACA^ 
5 GGAATATTTGATATTTTCAGTATGGTGGTTTCCATCATTGTATCTTATATTTTATTTTATG^ 
ACCTGTTGACTACATTATCTATACGAGTTTGGCCTTCCTGTrCTATCAATTGATGATTGGT^ 
CGAGCATTAGTCGTTACAGCAAGATTACGGATTTCATGAAAATCTTTTTTGGTGTGACrGCTA 
ATATAGTATCTGTTATGCCTTCTTGCCACTCTTCTC^ 

GATTTTATTGCCACGGATTACTTGGCAGTTAATCTACTCCAGACGCAAAAAAGGTAGTGGTGATGGAGAACACCG 
1U TCGGACCTTmGATTCKJTGCCGGTGATGGTGGGGCTCTTTTTATGGATAGTTA 

GAACTGGTCGGTATTTTGGATAAGGATTCTAAGAAAAAGGGTCAAAAACTTGGTGCT 
ATGACAATCTGCCTGAATTAGCCAAj\CGCCATCAAATCGAGCGTGTC^ 

AGAATATGAGCGTATCTTGCAGATGTGTAATAAGCTGGGTGTCAAATGTTACAAGATGCCTAAGGTTGAAACT 
TGTTCAGGGCCTTCACCAAGCAGGTACTGGCTTCCAAAAAATTGATATTACGGACCTTTTGGGT^ 
1J CGTCTTGACGAATCGCGTCTGGGTGCAGAACTGACAGGTAAGACCATCTTAGTCACAGGAGCTGGAGGTTCAATC 
GGTTCTGAAATCTGTCGTCAAGTTAGTCGCn"CAATCCTGAACGCATTGTCTTC 

TCTACli mi rTA TCATGAAT TGAT TCGTAAGTTCCAAGGGATTGATTATGTACCTGTGATTGCGGACATTCAAGA 
CTATGATCGTTTGTTGGAAGTCTTTGAGCAGTACAAACCTGCT^ 

CCTATGATGGAGCGCAATCCAAAAGAAGCCTTCAAAAACAATATCCGTGGAACTTACAATGTTGCTAAGGCTGTT 
2U GATGAAGCTAAAGTGTCTAAGATGGTTATGATTTCGACAGATAAGGCAGTGAATCCACCAAATGTTATGGGAGCA 
ACCAAGCGCGTGGCGGAGTTGATTGTCACTGGCTTTAACCAACGTAGCCAATCAACCTACTGTGCAGTTCGTTTTG 
GGAATGTTCTTGGTAGCCGTGGTA GTGT CATTCCAGTCTTTGAACGTCAGATTGCTGAAGGTGGGCCTGTAACGGT 
GACAGACTTCCGTATGA CCCGT TACTTTATGACCATTCCAGAAGCTAGCCGTCTGGTTATCCATGCTGGTGCTTAT 
GCCAAAGATGGGGAAGTCTTTATCCTTGATATGGGCAAACCACTrCAAGATTTATGACTTGGCCAAGAAGATGGTG 
25 CTTCTAAGTGGCCACACTGAAAGTGAAATTCCAATCGTTGAAGTTGGAATCCGCCCAGGTGAAAAACTCTACGAA 
GAACTCTTGGTATCAACCGAACTCGTTGATAATCAAGTTATGG 

C TTTA GAATCCATCAATCAAAAGATTGGAGAGTTCCGCACTCTCAGTGGAGATGAGTTGAAGCAAGCTATTATCG 
CCTTTGCTAATCAAACAACCCACATTGAATAA 

30 MNKKLTDYVlDLVEIL^QQKQVFWGIFDIFSMVVSIIVSYILf^ 
SKriTlFMKIFFGVTASSVl^YSlCYAFLPi^SiRFIIUnL^^ 

MDSYQHFTSELELVGILD^SKKKGQKLGGIPVLGSYDNLPEI^KRHQIERVIVAIPSLDPSEYERILQMCNKLGVKCYK 
MPKVETVV(^LHQAGTGFQKIDrTDLLGRQEIRLDESRLGAELTGKTILVTGAGGSIGSEICRQVSRFNPERIVLLG 
SrYLVYHELIRKF^mYVPVlADIQDYDRLLQVFEQYKPAIVYHAAAHKHVPMMERNPKEAFKNNIRGTYNVAXAVD 
35 EAKVSKMVMISTDKAVNPPNVMGATKRVAELIVTGFNQRSQSTYCAVRFGNVLGSRGSVIPVFERQUEGGPVTVTDFR 
MTRYFMTIPEASRLVIHAGAYAKIX3EVFILDMGKPVKIYDLAK^ 
DNQVMDKIFVGKVNVMPLESINQK1GEFRTLSGDEUCQA1IAFANQTTHIEZ 

ID101 1338bo 

40 

atgattgaactttatgatagttacagtcaagaaagtcgagatttacatgaaagtctagtcgctactggtctttctc 
aacttggagtggtcatcgatgc^atggttttctgcctgatggtctgctttctcctttt 
gaggatggaaaacctctctattttaatcaagttcccgtttcag a t ti ' 1 ' 1 gggaaattttaggagataatcagtctg 
cttgtattgaagatgtgacgcaggagagggctgtcattcattatgctgatggaatgcaggctcgcrtggttaaaca 

45 ggtagactggaaagacctagaaggtcgagtacgtcaggttgaccactacaatcgcttcggagcttgttttgctac 
aacgacrtatagcgcagatagcgagccgattatgacagtttaccaagatgtcaatggtcaacaagttttactgga 
aaaccat gtgacgg gtgatatcttattgactttgccaggtcacrrccatgcgttactttgcaaataaagt^ 
atcacc 1 1 c irim'igcaagatttggaaatagataccagtcagcttatctttaatactctagcgactccrri ct1 ggt 
ttccttccatcat ccaga taaatctggctcggatgtcttggtatggcaggaaccrctctatgatgccattccaggt 

50 aata tgca gttgattttggaaagtgataatgtgcgtactaagaagatcatcattccaaataaggcgacttatgag 
cgcgctitagagttaactgacgagaaataccatgatcagtttgtgcacttgggttatcattaccagttcaaacgtg 
ataatttcctaagac gaga tgccttaatcttgaccaattcagatcagattgagcaagtagaagcaatcgcaggag 
ccttgcctg atgt c^ctrrccgtattgcagcggtgacagagatgtcntctaagctcttag 
aatgtggccctttaccagaacgctagtccacagaagattcaggagctgtatcaactgtcggatatttacttggata 

55 taaaccacagtaatgagttgctacaggcagtgcgtcaggcctttgagcacaatct 

gacggtgcaca ataga ctttatatcgctccagaccatctatttgaaagtagtgaagttgcrgctttggttgagacc 

attaaattcgccctttcagatgttgatcaaatgcgtcaggcaot 

acttggtg agat atcagg aaaccatg caaactgttttagg aggctaa 

60 m1elydsysqesrdlheslvatglsqlgvvidadgflpdgllspftyylgyedgkplyfnqvpvsdfweilgdnqsacie 
dvtqeravihyadgmqarlvkqvdwkdlegrvrqvdhynrfgacfatttysadsepinrrvt 
tgdilltlpgqsmryfankvefitfflqdleidtsqlifntlatpflvsfhhpdksgsdvlvwqeplydaipgnmqliles 
dnvrtkkiiipnkatyeraleltdekyhixjfvhlgyhyqfkrdnflrrdaultnsdqieqveaugalpdvtfriaavt 
emssklldmlcypnvalyqnaspqkiqelyqlsdiyldinhsnellqavrqafehnllilgfnqtvhnrlyiapdhlfe 

65 ssevaalvetiklatsdvdqmrqalgkqgqhanyvdlvryqetmqtvlggz 
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'DIM lSIttP 

ATG ACAAT TTACAATATAAATTTAGGAATTGGTTGGGCTAGTAGCGGTGTTGAATACGCTCAAG 

CTTtjTTTTTCGGAAA TTAA ATCTGTCCTCTAAGTTTATCnTACAGATATGA 

ACAGCCAATATTGGTTTTGATGATAATCAGGTTATCTGGCTTTATAATCATTTCACAGATAT 

CTAGCGTGA CAGTGGAT GATGTCITGGCTTACTTTGGTGGTGAAGAAAGTCACAGAGA.VAAAAA 

TACGTGTATTC 1 ' 1 ' I ' 1 ' I' l GACCAAG ATAAGTTTGTAACCTGTTATTTGGTTG ATGAG AACAAGG ACTTGG7TCAACA 

TGCCGAGTATGi i l i l AAGGG AAACCTG ATTCGG AAGGATTA C 1 1 11 CI 1 AT ACGCGTT ATTGTAGCG AGTATTTT 

GCTCCCAAGGACAATGTTGCAGTCTTATACG\ACGAACTTTTTATAATGAAGACGGGACTCCAGT 

T GATG AATCAA GGGAA GGAAGAAGTTTATCATTTCAAGGATAAGATTTTCT 

CCTTTATGAAATCTTTGAATTTGAATAAGTCTGATTTGCT 

GTTTGAGGAAGCACAGACAGCAG\TCTAGCGGTAGTTGTTCATGCGGAGCATTATAGTGAAAATGCTACAAATGA 

GGACTATATCCTTTGGAATAACTATTATGACTATCAGTTTACCAATGCAGATAAGGlTGACnTCTT^ 

ACTGATAGACAAAATGAAGTTCTACAAGACCAATTTGCCAAATATACTCAGCATCAGCCAAAGAT^ 

CCTGTAGGCAGTATTGATTCCTTGACAGATTCAAGTCAAGGGCGCAAACCATTTTCATTGATTACGGCTTCACOT 

TTGCCAAAGAAAAGCACATTGATTGGCTrGTGAAAGCTGTGATTGAAGCTCATAAGGAGT^ 

TTGATATCTATGGTAGTGGTGGAGAAGATTCTCTGCTTAGAGAAATTATTGCAAATCATCAGGCAGAGGACTATAT 

CCAACTCAAGGGGCATGCGGAACTITCGCAGATTTATAGCCAGTATGAGGTCT^ 

AGGA TTTG GTCTGACCTTGATGGAAGCTATTGGTTCAGGTCTACCTCTAATTGGTTT^ 

AGACCTTTATAGAGGATGGGCAAAATGGTTATTTGATTCCAAGTTCATCTGACCATGTAGAAGACCAAATCAAGC 
AAGCTTATGCCGCTAAGATTTGTCAATTGTATCAAGAAAATC 

TGCAGAAGGCTTCTTGACCAAAGAAATTTTAGAAAAGTGGAAGAAAACAGTAGAGGAGGTGCTCCATGATTGA 
MTIYNINLGIGWASSGVEYAQAYRAGVFRKLNLSSKHTO 

VDDVl^YFGGEESHREKNGKVLRVFFFDQDKFVT-CYLVDENKDLVQHAEYVFKGNLIIUCDYFSYTRYCSEYFA?^ 

VAVLYQRTFYNEDGTPVYDILMNQGKEEVYHFKDIOFYGKQAFVRAFMKSLNLNKSDLVILDRETGIGQVVFEEAQTA 

HLAVVVHAEHYSENATNEDYILWNNTm>YQFTNADiC^ 

SQGRKPFSLn-ASRLAKEKHlDWL^VlEAHKELPELTFDIYGSGGEKlXREIIANHQAEDYIQLK 

VYLTASTSEGFGLTLMEAlGSGU>UGFDVPYGNQTFffiDGQN^ 

YSYQIAEGFLTKEILEKWKKTVEEVLHDZ 



ID103 2292bo 

ATGTCCTCT CTTTC GGATCAA GAATT AGTAGCTAAAACAGTAGAGTITCGTCAGCGTCn^ 
TAGACGATATTTTGGTTGAAGCTTTTGCTGTGGTGCGTGAAGCAGATAAGCGGATTTTAGGGATGm 
TGTTCAAGTCATGGGAGCTATTGTCATGCACTATGGAAATGTTGCTGAGATGAATACGGGGGAAGGTAAGACCrT 
G AC AGCTACCATG CCTGTCTATTTG AACGCTTTTTCAGG AGAAGG AGTG ATGGTTGTG ACTCCTAATG AGTATTTA 
TCAAAGCGTGATGCCGAGGAAATGGGTCAAGTTTATCGTTTTCT^ 

ATCCAA AGAAG GAGATGAAAGCTGAAGAAAAGAAGCTTATCTATGCTTCGGATATCATCTACACAACCAATAGTA 

ATTTAGGTTTTGATTATCTAAATGATAACCTAGCCTCGAATGAAGAAGGTAAGTTm 

GATTATTGATGAAATTGATGATATCTTGCrrGATAGTGCACAAACTCCTCrGATTATTGCGGGTTCTCCT 

AGTCTAATTACTATGCGATCATTGATACACnTGTAACAACCTTGGTCGAAGGAGAGGATTATATCm 

GAAAGAGGAGGTTTGGCTCACTACTAAGGGGGCCAAGTCTGCTGAGAATTTCCT^ 

GGAAGAGCATGCGTCTTTTGCTCGTCATTTGGTTTATGCGATTCGA 

TATATCATTa3TGGAAATGAGATGGTACTGGTTGATAAGGGAACAGGGCGTCTAATGGAAATGACTAAAm 
GGAGGTCTCCAT CAGGCT ATTGAAGCCAAGGAACATGTCAAATTATCTCCTGAGACGCGGGCTATGGCCTCGATC 
ACCTATCAGAGTCTTTTTAAGATGTTTAATAAGATATCTGGTATGACAGGGACAGGTAAGGTCGCGGAAAAAGAG 
TTTATTGAAACT rACA ATATCrrCTGTAGTACGCATTCCAACCAATCGTCCGAGACAA 

ATCTATA TATCAC TTTACCTGAAAAAGTGTATGCATCCTTGGAGTACATCAAGCAATACCATGCTAAGGGAAATCC 
TTTACTCGTTTTTGTAGGCTCAGTTGAAATGTCTCAACTCTATTCGTCTCT 

ATGTCCTAAATGCTAATAATGCGGCGCGTGAGGCTCAGATTATCTCCGAGTCAGGTCAGATGGGGGCTGTGACAG 
TGGCTACCTCTATGGCAGGACGTGGTACGGATATCAAGCTTGGTAAAGGAGTCGCAGAGCTTGGGGGCTTGATTG 
TTATTGGGACTGAGCG GATG GAAAGTCAGCGGATCGACCTACAAATTCGTGGCCGTTCTGGTCGTCAGGGAGATC 
CTGGTATGAGTAAATTTTrrGTATCOTAGAGGATGATGTTATCAAGAAAm 

GTACAAAGACTATCAGGTTCAAGATATGACTCAACCGGAAGTATTGAAAGGTCGTAAATACCGGAAACTAGTCGA 
AAAGGCTCAGCATGCCAOTGATAGTGCTGGACGTTCAGCACGTCGTCAGACTCTGGAGTATGCTGAAAGTATGAA 
TATACA ACGGG ATATAGTCTATAAAG AG AG AAATCGTCTAATAG ATGGTTCTCGTGACTTAG AGG AT G 1 ' 1 U I'l GTG 
GATATCATrGAGAGATATACAGAAGAGGTAGCGGCrGATCACTATGCTAGTCGTGAATTATTGTTTCACrTTATTG 
TGACCAATATTAGTTTTCATGTTAAAGAGGTTCCAGATTATATAGATGTAACT 

TATGAAGCAGGTGATTGATAAAGAACTTTCTGAAAAGAAAGAATTACTTAATCAACATGACTTATATGAACAGTr 
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TTTACGACTTTCACTGCTTAAAGCCATTGATGA 

GCTATCGGTGGTCAATCTGCTAGTCAGAAAAATCCAATCGTAGAGTACTATCAAGAAGCCTACGCGGGCTTTGAA 

GCTATGAAAGAACAGATTCATGCGGATATGGTGCGTAATCTCCTGATGGGGCTGGTTGAGGTCACTCCAAAAGGT 
GAAATCGTG ACTCATTTTCCATA A 

MS5UDQELVAKTVE^QR1^EGESI^DILVEAFAVVR£ADKR1LGMFPYDVQVMGAWM 

ATMPVYLNAFSGECVMVVTPNEYUKIU>AEEMGQVYRFLGL^^ 

DYLNDNLASNEEGKRRPF^nDEIDDIUJ)SAQTPUIAGSPR^ 

GAKSAENFLGIDNLYKEEHASFARHLVYAIRAHKL^ 

VKUPETRAMASriTQSLFKMFNKISGMTtn-GKVAEKEFIETYNMSVV^ 

QYHAKGNPLLVFV<3VEMSQLYSSlXraGL\HNV^^ 

LGGLIVIGTERMESQRIDLQlRGRSGR(X3DroMSKFFVSLEDDVIKKFGPSWVHKKYKDYQVQ 
LVEKAQHASDSAGRSARR<?TL£YA£SMNIQRDIVYKER^ 

NISraVKEVPDYIDm»KTAVRSFMKQVIDm^EKKELLNQHDLYEQFLRJLSLU^ 
QSA^QKNPrVEYYQEAYAGFEAMKEQIHADMVRNlXMGLVEV^ 

ID104 879bp 

ATGAAACAAGAATGGTTTGAAAGTAATGATTTTGTAAAAACAACAAGCAAGAACAAGCCTGAAGAGCAAGCT 

AGAGGTTGCAGACAAGGCTGAAGAAAGGATACCCGATCTCGATACACCAATTGAAAAAAATACTCAGTTAGAGG 

AGGAAGTCTCTCAAGCTGAAGTCGAATTGGAAAGCCAGCAAGAAGAGAAAATTGAAGCTCCTGAAGACAGTGAA 

G CG AG A A C AG AA AT AG AAG AAAAG AAGGC ATCT AATTCT A CTG A AG AAG AG CCAG ACCTTTCT AAAG A AACAG A 

AAAAGTCACTATAGCTGAAGAGAGCCAAGAAGCTCTTCCTCAGCAAAAAGCAACCACGAAAGAGCCACTTCTTAT 

CAG I^ T( ^ AGAAAGTCOTA ™^ 

TGATTTTTGGTCTTGGCTAGTGGAAGCGATCAAATCTCCTACAAGTAAGTTGGAAACAAGTATCACACACAGTTAC 

ACAGCCTTTCTCTTGCTCATTCTGTTTTCTGCATCTTCCT T I I'1'C i 'n'AGTATCTATCACATCAAACATGCTTACTAT 

GGACATATAGCAAGCATTAACAGTCGCTTCCCTGAGCAGCTAGCTCCTTTAACT 

AGTAGCGACAACACTCTTCTTCTmCATTCCTCTTGGGTAGTTrCGTTGTGAGACGAm 

GACrGGACGCTAGACAAGGTTCTCCAACAATATAGTCAACrCTTGGCAATTCCAATCTCCTCACTG 

i uli i iGU 1 1 Cl'l'I GATAGCCTACGATTTACAGCCCTCTTGTGTGTGA 

MKQEWFESNDFVKTTSKMCPEEQAQEVADKAEERIP^^^ 

EKKASNSTEEEPDLSKETEKVTIAEESOEALPQQKATTKEPLLISKSLESPYIPDQAPKSRDKWKEQVLDFWSWLVEA^ 

roKLETSrTHSYTARIXILFSASSFFFSIYHIKHAYYGHIASW^ 

QEKDWTLDKVLQQYSQLLAIPISSLLLLVSLLSLIAYDLQPSCV2 

ID 106 327bn 

ATGTACTTTCCAACATCCTCTGCCTTGATTGAATTTCrCATCTTGGCTGTACTGG 

TGAGATTAGCCAAACG\TTAAGCTGATCGCTAATATCAAAGAATCCACACTCTATCCCATTCTCAAAAAATTGGA 
AGG C A AT AG CTTTCTG ACAACCT ATTCT AG AG AGTTCC AAGGTCG C ATGCGCAAAT ACT A CTCCTTG ACA AA CGG 
TGGTATAGAGCAGCTCTTGACCCTAAAAGATGAATCK3GCACT 
GAGTATCCGCCATGACAAGAACTGA 

MYFPTSSALIEFUI^VI£QGDSYGYEISQTIKLMNIKESTLYPIIiCKLEGNSFLTTYSREFQGRM 
LKDEWALYTDTINGIIEGSIRHDKNZ 

IP 108 9$4bp 

ATGG ATTTT GAAAAAATTGAACAAGCTTATATCTATTTACT 

CCAAOTrrATGACGCOTGCmjGAGCAAAATAGCATCTATCTGGATGGTGAAACTGAGCTAA^ 

ACAACAATCAGGCCCTTAAGCGTrrAGCACTACGCAAAGAAGAATGGCTCAAGACCTACCAGmCTCTTGATGA 

AGGCTGGGCAAACAGAACCCTTGCACK3CCAATCACCAGTTTACA 

TGTGGAAGAGTTGTITAAAGAGGAGGAAATTACTATCCTCGAAATGGGTTCTGGGATGGGAATTCTAGGCGCTAT 
1 1 i Ui I GACCrCGCTTACT AAAAA GGTGGATTACTTGGG Aj\TGG AAGTGGATG ATTTGCTGATTGATCTGGCAGCT 
AGCATGGCAGATGTAATTGGTTTGCAGGCTGGCmGTCCAAGGAGATGCCGTTCGCCCACAAATGCTCAAAG^ 
AGCG ATGTGGTCATCAGTG ACTTGCCTGTCGGCTATTATCCTG ATG ATGCCGTTGCGTCGCG CCATCAAGTTG CTT 
CTA GCCAA GAAC^TACTTACGCCCATCACTTGCTCATGGAACAAGGGCTTAAGTACCTCAAGTCAGACGGA 
CTATTmCTAGCTCCGAGTGAmGTrGACCAGTCCrCA 

GAGTCTGGTTGCTATGATTAGTCTGCCT GAAAA TCTCTTTGCTAATGCCAAACAATCT 

AGAAGAAAAAT GAAAT AGCAGTAGAGCU 1 1 1 U 1 ' 1' I' ATCCACTTGCTAGCTTGCAAGATGC A AGTGTTTTAATG AA 
ATTTAAAG AAAATTTTC AAAAATGG ACTG\AGGTACTG AA AT ATAA 

MDFEKIEQAYIYLLENVQVtQSDUTNFYDALVEQNSIYLDGETELNQ^ 
GQTEPLQA^HQFI?DAlAIXLVRVEEU*EEEnUEMGSGMGlLGAIFm 
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GLQAGFVQGDAVRPQMLKESDWBDLPVGYYPDDAVA^ 

LLTSPQSDLLKEWUCEEASLVAMISI^ENLFANAKQSKTIFILQKKNEIAVEPFVYPUVSLQDASVLMKFKENFQKWTQG 



\p\}0 i<*wf 

ATGATTATTTTACAAGCTAATAAAATTGAACGTTCrmGCAGG 

TTGATGAACGAGATCGGATTGCTCTTGTTGGGAAAAATGGTGCAGGTAAGTCTACTCTTTTGA^ 

AGAAGAGGAGCCAACTAGCGGAGAAATCAATAAGAAAAAAGATATTTCTCTGTC 

TTTTGAGTCTGAAAATACCATCTACGATGAAATGCITCATGTCTTTAATGATTTGCGTCGGAC 

CGTCAGATGGAGCTGGAGATGGGTGAAAAGTCTGGTGAGGATTTGGATAAACrGATGTCAGATTATC 

TCTGAGAATTTTCGCCAAGCAGGTGGCTTTACCTATGAAGCTGATATTCGAG 

ACGAGTCTATGTGGCAGATGAAAATTGCTGAGCTTTCTGGTGGTCAAAATACTCGTTC 

CCTTGAAAAGCCCAATCTCTTGGTCTTGGACGAGCCAACTAACCACTTGGATATTGAAACCATCGCCT 

GAATTACTTGGTAAACTATAGCGGTGCCCTCATTATCGTCAGCCACGACCGTTATTTOT 

ATTACGCTAGATTTGACCAAGCATTCCTTGGATCGCTATGTGGGGAATTACTCTCGT^ 

AAAAGCTAGTTACTGAGGCAAAAAACTATGAAAAGCAACAGAAGGAAATCGCTGCTCTGGAAG 

GCAATCTAGTTCGTGCTTCAACGACTAAACGTGCTCAATCTCGCCGTAAACAA 

ACAAGCCTGAAGCTGGCAAGAAAGCAGCCAACATGACCTTCCAGTCTGAA^ 

CTGTTGAAAATGCAGCTGTTGGCTATGACGGGGAAGTCTTGTCACAACCTA^ 

TGCrGTCGCTATCGTTGGTCCAAATGGTATCGGCAAGTCAACCTTTATCAAGTCTATTGTGGACCAGATTCCTTT^ 

ATCAAGGGAGAAAAGCGCITrGGCGCTAATGTTGAGGTTGGTTACTATGACCAAACCCAAAGCAAGCTGACACCA 

AGTAA TACGGT GCTGGATGAACTCTGGAATGATTTCAAACTGACACCAGAAGTTGAAATCCGCAACCGTCTTGGA 

GCCTTCCTTTTCTCAGGAGATGATGTTAAAAAATCAGTCGGCATC 

TAGCTAAATTGTCTATGGAAAACAATAACTTTTTGATTCTGGATGAGCCGAC^ 

GGAAGTGCTAGAAAATGCCrrGATTGACTTTGATGGAACCTTGCTGTTTGTCAGTCATGATCG 

CGTGTGGCAACTCATGTTTTGGAATTGTCTGAGAATGGTTCAACTCTCTACCTTGGAGATTACGACT 

agaagaaagcaacagcagaaatgagtcagactgaggaagcttcaactagcaatcaagcaaaggaagcaagtcca 

gtcaatgactatcaggcccagaaj\gaaagtcaaaaagaagttcgcaaactcatgcgacaaatcgaaagtctaga 

agctgaaattgaagagctagaaagtcaaagccaagccatttctgaacaaatgttggaaacaa^ 

aactcatggaattacaggctgagctggacaaaatcagccatcgtcaggaagaagctatgcttgagtgggaagaat 

tatcagagcaggtgtaa 

miilqankiersfagevlfdninlqvderdrlalvgkngagkstllkilvgeeemgeinkkkdisl^ylaqdsrfese>rr 

lYDEMLHVFNDLRRTERQLRQMELEMGEKSGEDL^ 

lAEUGGQrfrRLAI^KMLI^PNLLVI^EPTNHLDlETlAW 

YVGNYSRFVEUCEQKLVTEAKNYEKQQKElAALEDFVN^ 

QSEKTSGNVVLTVENAAVGYDGEVL^QPINLDLRKMNAVArVGPNGIGKSTFDCSIVDQIPFIKGEKRFGANVEVGYYDQ 
TQSKLTPSNTVLDELWNTDFKLTPEVEIRNRIXAFLFSGDDVKXSVGMl^GGEK^ 

Dn>SKEVLENALIDFDGTLLI^SHDRYnNRVATHVLELSENGSTLYLGDYDYYVEKKATAEMSQTEEASTSNQAXEAS 
PVWYQAQKESQKEVRKLMRQIESLEAEIEEIXSQSQAISEQMLETNDADKLMELQAELDKISHRQEEAMLEWEELSEQ 



mill 1179bp 

ATGAATCGCTATGCAGTGCAGTTGATTAGCCGTGGGGCTATCAATAAAATGGGAAATATGCTCTATGATTATGGA 
AATAGTGTCTGGTTGGCTT CTAT GGGGACTATAGGAC\GACAGTTTTAGGAATGTATCAGATTTCTGAGCTCGTCA 
CATCTATTCTCGTCA ATCC CTTTGGCGGAGTTATTTCAGACCGTTTTTCTCGTCGT 
CTTGTTTCTGGGATTCTTTG TCTGGC T ATTTC mCATAAGGAATGATAGCT 

TAAC ATTGTGCAGGCTATTGe 1 1 1 i G CCTTTTCTCGC A CAG CC AAT AAAG CTATC AT AA CTG A AGTGGTGG AG AAA 
GATGAGATTGTGATCTATAATTCTCGCTTAGAGCTGGTTTTGCAGCriTGTAGGTGTTA 

CCT TGTTT TACAGTTTGCAAGTCTCCATATGACGCTACTGCTAGAC IT ITGl ' l CTAG 

TGGCTTTCmCCMAAAGAGGAAGCAAAAGTTCAAGAGAAAAAGGCrmACrrGGGAGAG 

TCAAGGATGGGTTACACTATATCTGGC^^ 

CTTTTTTGCAGCTTTTGAATTTCTACTTCCCTTTTCGAATCAGCTT^ 
TA ACTA TGGGG CCTA TTGGTTCCATCATTGGGGCTCTTCTAGCTAGTAAAATT^ 

GATTTTACTGGCTTTG ACAG GTGTCG GAGTTT TTATGATGGGATTACCACTTCCAA Ll 1 1 IC ' n ' I CCTTTTCTGGAA 

ATTTAGTTTGTGAATTGTTTATGACGATTT^ 

TTTCTTGGAAGAGTACTGAGTACAATTTTTACCTTAGCTATTCT 

CTTGCCAAGTGTCCATCTTTATTL" I l IL I IGATTATTGGACITGGAGTTGTAGCCTTATATTTCTTAGCTCTCGGAT 
ATGTTCGAACTCATTTTGAAAAATTGATATAA 

MNRYAVQLISRGAINKMGNMLYDYGNSVWLASMGTIGQTVLGMYQ 

CGILCLAISFIRNDSWMIGALIVANIVQA1AFAFSRTANKAIITEVVEKDEIVIYNSRLELVLQVVGVSSPVLSFLVLQFASL 
H^fITJ-LDSLTFFlAFVLVAFLPKEEAKVQEKKAFTGR 
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QLYGSEGAYASILTMGAICSHGALl^SKIKAMYNLLILLAU^ 

QTKVESEFLGRVL^IITLAlLFMPlAKGFMTVLPSVHLYSFLnGLGVVALYFl^LGYVRTHFEiaiZ 
IDU3 2466bp 



ATGCAAAATCAATTAAATGAATTA.\AACGAAAAATGCTGGAATTTTTCCAGCAAAAACAAAAA^ 
GCTAGACCTGGCAAGAAAGGTTCAAGTACCAAAAAATCTAAAACCTTAGATAAGTCAGCCATTTTCCCAGCT^ 
TTAC TGAGTATAAAAGCCTTATTTAACrTACTCTTTGTACTCGGTTTTCTAGGAGGAATGTTGGGAGCT 
CTTrGGGATACGGAGTGGCCTTATTTGACAAGGTTCGGGTGCCTCAGACAGAA^ 

1U ACATCTCTTCTArrTCAGAGATTACCTATTCGGACGGGACGGTGATTGCTTCCATAGAGAGTGATn 

TTCTATCTCATCTGAGCAAATTTCGGAAAATCTGAAGAAGGCTATCATTGCGACAGAAGATGAACAC^ 
ACATAAGGGTGTAGTACCCAAGGCGGTGATTCGTGCGACCTTGGGGAAATTTGTAGGTTTGGGTrCCT 
GGTTG^ACCTTGACCCAGCAACTAATTAAACAGCAGGTGGTTGGGGATGCGCCGACCrTGGCTCGTAAGGCG 
GAGATTGTGGATGCTCTTGCCTTC<iAACGCGCCATGAATAAAGATGAGATTTTAACGACCT ATCTCAATGTGGCTC 

15 CCTTTGGCCGAAATAATAAGGGACAGAATATTGCAGGGGCTCGGCAAGCAGCTGA 

CCAGTCAGTTGACTGTTCCTCAAGCAGCATTTTTAGCAGGACTTCCACAGAGTCCCATTACTTACTCT 
AAATACTGGGGAGTTGAAGAGTGATGAAGACCTAGAAATTGGmAAGACGGGCTAAGGCAGTrCTITACAGTAT 
GTATCGTACAGGTGCATTAAGCAAAGACGAGTATTCTCAGTACAAGGATTATGACCTTAAACAGGACTTTTTACC 
ATCGGGCACGGTTACAGGAATTTCACGAGACTATTTATACTTTACAACTTTGGCAGAAGCTCAAGAACGTATGT 

20 GACTATCTAGCTCAGAGAGACAATGTCTCCGCTAAGGAGTTGAAAAATGAGGCAACrCAGAAGTTTTATCGAGAT 
TTGGCAGCCAAGGAAATTGAAAATGGTGGTTATAAGATTACTACTACCATAGATCAGAAAATTC 
CAAAGTGCGGTTGCTGATTATGGCTA TCTTT TAGACGATGGAACAGGTCGTCrrAGAAGTAGGGAATGTCTTGATC 
GATAACCAAACAGGTGCTATTCTAGGCTTTGTAGGTGGTCGTAATTATCAAGAAAATCAAAATAATCATGCCTT^ 
ATACCAAACGTTCGCCAGCTTCTACTACCAAGCCCTTGCTGGCCTACGGTATTGCTATTGACCAGGGCTTGATGGG 

25 AAGTGAAACGATTCTATCTAACTATCCAACAAACTTTGCTAATGGCAATCCGATTATGTATGCTAATAGCAAGGG 
AACAGGAATGATGACCrTGGGAGAAGCTCTGAACTATTCATGGAATATCCCTGCTTACTGGACCTATCGTATGCTC 
CGTGAAAAGGGTGTTGATGTCAAGGGTTATATGGAAAAGATGGGTTACGAGATTCCTGAGTACGGTATTGAGAGC 
TTGCCAATGGGTGGTGGTATTGAAGTCACAGTTGCCCAGCATACCAATGGCTATCAGACCTTAGCTAATAATGGA 
GTTTATCATCAGAAGCATGTGATTTCAAAGATTGAAGCAGCAGATGGTAGAGTGGTGTATGAGTATCAGGATAAA 

30 CCGGTTCAAGTCTATTCAAAAGCTACTGCGACGATTATGCAGGGATTGCTACGAGAAGTTCTATCCTCTCGTGTGA 
CAj\CAj\CCTTCAj\GTCTAACCTGACTTCTTTAAATCCTACTCTGGCTAATGCAGATTGGATTGGG 
AACCAACCAAGACGA^Aj\TATGTGGCTCATGCTTTCGACACCTAGATTAACCCTAGGTGGCTGGATTGGGCATGA 

tgataatcattcattgtcacotagagc^ggttattctaataactctaattacatggctcatctggta^ 
cagcaagcttccccaagcatttgggggaacgagcgctttgctttagatcctagtgtagtgaaatcggaagtcttg 
35 aaatc a ac aggtcaa aaacc ag ag aaggtttctgttg a agg aaaag aagt ag ag gtc acaggttcg actgtt acc 
agctattgggctaataagtcaggagcgccagcgacaagttatcgctttgctattggcggaagtgatgcggattat 
cagaatgcttggtctagtattgtggggagtctaccaactccatccagctccagcagttcaagtagtagttctagcg 
atagcagtaactcaagtactacacgaccttcttcttcaagggcgagacgataa 

40 mqnqlnelkrkmleffqqkqknkksaju>gkkgsstkksktld^ 

GVALFDKVRVPQTEELVNQVKDlSSlSErTODGTVlASIESDL^ 
ATLGKFVCLGSSSGGSTLTQQLIKQQVVGDAPTl^RKAj\EIVDALALERA^ 

ROAAEGIFGVDA^QLTVPQAAfl^GLPQSPrTYSPYE^GELKSDEDLEIGLRRAXAVLYSMYRTGALSKDEYSQYKDY 
DLKQDFLPSGTVTGlSRDYLYFITLAEAQERMYDYI^QRDrfVSAjCELKNEA^^ 
45 HSAMQSAVAI)YGYLLDDGTGRVEVGNVLMDNQTGAILGFVGGRN^^ 

LMGSETll^NYPTNFA^GNPlMYANSKGTGMMTLGEALNYSWNlPAYWTYRMLREKGVDVKGYMEKMGYEIPEYGIE 
SLPMGGGEVTVAQHTNGYQTI^NNGVYHQKHVISKlEAj^ 

TFKSNLTSLNPTLANADWIGKTGTTNQDENMWLNU^RLTLGGWIGHDDNHSl^RRAGYSNiNSNYMAHLVNAIQQA 
SPSIWGNERFALDPSVVKSEVXJCSTGQKPEKVSVEGKEVEVTGSTVTSYWANKSGAPATSYRFAIGGSDADYQNAWSSI 
50 VGSLPTPSSSSSSSSSSSDSSNSSTTRPSSSRARRZ 

ATGAj\AAA^TTTTATGTAAGTCCAATTmCCTATTCT 

55 TAi 1 11 i GTTAATAATAATCTGTTGACGGTTTTAATTTTG ri'I CI'rrri'GTAGGAGGCTATG 1 Mill 1 ATTTAAGAA 

ACTGAGAGTGCATTATACAAGGAGTGATGTAGAACAGATACACrrATGTAAACCACCAAGCGGAAGAAAGTTTGAC 
AGCTCTATTGGAACAGATGCCTGTAGGTGTTATGAAATTGAATTTATCTTCTGGAGAGGTTGAGTGGTTTAATCCC 
TATCCTGAj\TTGATTTTGACCAAGGAAGATGGTGATTTTGATTTAGAAGCTGTTC>AACGATTATC^ 
TAGGAAATCCGTCTACTTATGCCAj\GaTGGTGAGAAGCGTTATGCTGTTCATAT^ 

60 GTATTTTGTAGATGTATCCAGGGAACAAGCCATAACAGATGAATTGGTAACAAGTAGACCAGTGATTGGGATTGT 
CT CTGT GG AT AATT A TG AT G ATTTGG AGG ATG A A ACTTCTG AGTC AG AT ATT AGTC AA ATCAAT AGTTTTGT AG CT 
AATTTTATATCAGAGTmCAGAAAj\ACACATGATGTTTTCTCGTCGGGTA>GTATGGATCGATm 
TGACTACACGGTGCTTGAGGGCTTGATGAATGATAAATTrrCTGTTATTGATGCTTTCAGAGAAGAGTCGAAACAG 
AGACAGTTGCCCTTGACCTTAAGTATGGGATTTTCTT^^ 

65 TGCTCAj\TTTGAACTTGGCTGAAGTACGTGGTGGCGACCAGGTGGTTGTTAAGGAAAACGACGAAACGAAAAATC 
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CAGTTTATTTTGGTGGTGGGTCTGCTGCTTCAATCAAGCCT 

TTCAGATAAGATTCGGAGTGTAGATCAGGTTTTTGTAGTCGGTCACAAAAATTTAGACATGGATGCTTTGGGCTCT 
GCTGTAGGTATGCAGTTGTrCGCCAGCAATGTGATTGAAAATAGCTATGCTCnTTATGATGAAGAACAAATGTCTC 
CAG AT ATTG AAC G A G CTGTTTC ATTCAT AG A AAA AG AAGG AGTT A CG AAGTTGTTGTCTGTT A AGG ATG CAATG G 
D GGATGGTGACCAATCGTTL i 1 1 0 VI GATTCTTGTAGACCATTCAAAG ACAGCCTTAACATTATCAAAAGAATTTTA 

TGATTTATTTACCCAAACCATTGTTATTGACCACCATAGAAGGGATCAGGATTTTCG\GATAATGCGGTTATTACT 
TATATCGAAAGTGGTGCAAGTAGTGCCAGTGAGTTGGTAACGGAATTGATTCAGTTCCAGAATTCTAAGAAAAAT 
CGTTTGAGTCGTATGCAAGCAAGTGTCTTGATGGCTGGTATGATGTTGGATACTAAAAATTTCACCTCGCGAGTAA 
CTAGTCGGACATITGATGTTGCTAGCTATCTCAGAACGCGCGGAAGTGATAGTATTGCTATCCAGGAAATCGCTG 
IU GACAGATTTTGAAGAATATCGTGAGGTCAATGAACTTATTTTACAGGGGCGTAAATTAGGTTCAGATGT 

GCAGAGGCTAAGGACATGAAATGCTATGATACAGTTGTTATTAGTAAGGCAGCAGATGCCATGTTAGCCATGTC^ 

GGTATTGAAGCGAGTTTTGTTCTTGCGAAGAATACACAAGGATTTATCTCTATCTCAGCTCG 

TGAATGTACAACGGATTATGGAAGAGTTAGGCGGTGGAGGCCACTrTAATITGGCAGCAGCTCAAATTAAAGATG 

TAACCTTGTCAGAAGCAGGTGAAAAACTGACAGAAATTGT ATT AAATGAAATG AAGG AAAAGGAGAAAGAAGAA 
15 TGA 

MKKFYVSPIFPILVGLIAFGVLSTHIFVNNN 

QMPVGVMKLNI^SGEVEWFNPYAELILTKEDGDFDIXAVQTIIKASVGNPSTYAKLGEKRYAVHMDASSGVLYFVDVS 
I^QA!TOELVTSRPVIGIVSVD^DDL£DETSESblSQW 

IV DKFSVIDAFREESKQRQLPLTLSMGFSYGDGNHDEIGKVALLNLNLAEVRGGDQVVVKENDETKNPVYFGGGSAASK 

RTRTRTRAMMTAISDKIRSVDQVI^GHKNLDMDALGSAVGMQLFASNVENSYALYDEEQMSPDIERAVSFIEKEGV 

TKIXSVKDAMGMVTNRSLLILVDHSKTALTl^KEFYDLFTQTIVI^ 

NSKKNRI^RMQASVLMAGMMUyTKNFTSR^SRTFD^ 

LUEAKDMKCYDTVVISKAADAMLAMSGLEASFVLAO^ 
25 LSEAGEKLTEIVLNEMICEKEKEEZ 

CD115 66?bp 

ATGAAGTGCTTGTTATGTGGGCAGACTATGAAGACTGTm 
iU ACTCTTGTCTTTGTTCAGACTGTGATTCTACTTTTGAAAGAATTGGGGAAGAGAACT^ 

AGAGTTGTCAACAAAGTGTCAAGATTGTCAACTTrGGTGTAAAGAGGGAGTTGAAGTCAGTCATAGAGCGATTTT 
TACTTACAATCAAGCTATGAAGGATTTTTTCAGTCGGTATAAGTTTGATGG^^ 

GCTTCA mm l AAGTGAGGAGTTGAAAAAGTACAAAGAGTATCAATTTGTTGTAATTCCCCTAAGTCCTGATAGAT 
ATGCTAATAGAGGATTTAATCAGGTTGAGGGCTTGGTAGAGGCAGCAGGCTITGAGTATCTGGATTTATTAGAGA 
J5 AAAGAGAAGAGAGAGCCAGTTCTTCTAAAAATCGTTCAGAGCGCTTGGGGACAGAACrrCCTTTCm 

GTGGAGTCACTATTCCTAAAAAAATCCTACTTATAGATGATATCTATACTACAGGAGCAACTATAAATCGTGT^ 
GAAACTGTTGGAAGAAGCTGGTGCTAAGGATGTAAAAACATTTTCCCTTGTAAGATGA 

MKCLLCGQTMKTVLT7SSLUXRNDDSCLCSDCDSTFERIGEENCPNCMKTELSTKCQDCQLWCKEGVEV 
4(J NQAMKDFFSRYKFDGDFLLRKVFASFLSEELKKYKEYQFVVIPLSPDRYANRGFNQVEGLVEAAGFEYU)UEKREER 
ASSSKNRSERLGTEU>FFIKSGVTIPKKIU.IDDIYTTGATINRVKKLIXEAGAKDVKTFSLVR2 

fD116 1299bo 

45 ATGAAAGTAAATTTAGATTATCTCGGTCGTTTATTTACTGAGAATGAATTAACAGAAGAAGAACGTCAGTTC 

G AG AAACTTCC AG C AATG AG AAAGG AG AAG G GG AAACTTTTCTGTC AACGCTGT AAT AGT ACT ATTCT AG AAG AA 
TGGT ATTT GCCCATCGGTG CTTA CTATTGTCGAGAGTGCn'GCTGATGAAGCGAGTCAGAACTGATCAAACTTTAT 
ACTATrrTCCGCAGGAGGATTTTCCAAAGCAAGATGTTCTCAAATGGCGCGGCCAATTAACTCCTm 
GGTGTCAGAGGGATTGCn*CAAGTAGTAGACAAGCAAAAGCCAACCITAGTTCATGCGGTAACAGGAGCTGGAAA 

50 GACAGAA ATGAT TTATCAAGTAGTGGCTAAAGTGATCAATGCGGGTGGTGCAGTGTGTTrGGCTAGTCCTCGCAT 
AGATGTTT GTTT GG AGCTGTACAAGCGCCTGCAACAGG A 1 1 1 riCITGCGGGATAGCTTTGCTACATGGAGAATCG 
GAACCTTATTTTCGAACACCACTAGTrcrrrGCAACAACCCATCAGTrATTGAAG 

GATAGTGGATGAAGTAG ATGCTm CCTrATGTTGATAATCCCATGCTTTACCACGCTGTCAAGAATAGTGTAAAG 
GAGAATGGATTGAGAATCTTTTr AACAG CGACTrCGACCAATGAGTTAGATAAAAAGGTCCGTTTAGGAGAACT 
55 AAAAGACTGAATTTACCGAGACGGTTTCATGGAAATCCGTTGATTATTCCAAAAC^ 

ATCGCTACT TAGAC AAGAATCGTTTGTCACCAAAGTTAAAGTCCTATATTGAGAAGCAGAGAAAGACAGCTTATC 
CGTTACTCATTTTTGCTTCAGAAATTAAGAAAG 

AGAAAATTGGCTTTGTATCTTCTGTAACAGAGGATCGATTAGAGCAAGTACAAGCTrTTCGAGATGGAGAACTGA 
C AAT A CTT ATC AGT ACG AC AATCTTGG A G CGCG G AGTT ACCTTCC CTTGTGTGG ATGTTTTCGT AGT AG AGG CC AA 
00 TCATCG TTTGT TTACCAAGTCTAGTTTGATTCAGATTGGTGGACGAGTTGGACGAAGCATGGATAGACCGACAG 

GATTTGCi 1 1 i U 1 l CCATG ATGGGTTAAATGCTTCAATCAAGAAGGCG ATTAAGGAAATTCAG ATG ATG AATAAGG 
AGGCTGGTCTATGA 

MKVNLDYLGRLFTENELTEEERQljVEKLPAMRKEKGKLFCQRCNSTILEEWYLPIGAYYCRECLLMKRVRSDQTLYYF 
65 PQEDFPKQDVLKWRGQLTPFQEKVSEGLLQVVDKQKmVHAVTGAGKTEMIYQVVAKVINAGGAVCLASPRIDVCL 
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ELYKW.QQDreCGIALLHGESEPYraTPLVVATT^ 
LTATSTNEU3KKVRLGELKRLNLPRW=HGNPLnPKP^^ 
Q^EILQEQFPNEKIGFVSSVTEDRLEQVQAPRDGELTILISTTIL^ 
MDRPTGDLLFFHDGLNASEKKAIKEIQMMNKEAGL2 

IDU7 870bp 

ATGCAAATTCAAA AAAGT TTTAAGGCGCAGTCT 
CTAGATGATATGAOTITCGT^ATCCAGACCTTGAAAGA^ 
ACAGGGOTTTGCTCAAGCATTTTGACATTrCC^CCAAGCAGATCACrrm 

ATTCCTGATTTGATTGG 1 1 1 L ri GAAAGCAGGGCAAAGTATTGCTCAGGTCTCTGATGCCGGTTTGCCTAGCATTT 
CAGACCCTGGTCATGATTTAGTTAAGGCAGCTATTGAGGAAGAAATTGCAGTTCrrGACACrrT 
AGGAATTTCTGCCITGATT^CAGTGGTTTAGCGCCACAGCCACATA^ 

GGTCAGCAGAAGCAAi i i 1 1 i GGCTTGAAAAAAGA'lTAl'CCTGAAACACAGATTTTTTATGAATCACCTCATCGTG 
O TAGCAGACACGTTGGAAAATATGTTAGAAGTCTACGGTGACCGCTCCGTTGTCTTGGTCAG^ 

TCTATGAAGAATACCAACGAGGTACTATCTCTGAGTTATTAGAAAGCATTGCTGAAACGCCACTCAAG 
GTCTTCTCATTGTTGAGGGTGCCAGTCAGGGTGTGGAGGAAAAGGACGAGGAAGACTTGTTCGTAGAAATTCA 
CCCGCATCCAGCAAGGTGTGAAGAAAAACCAAGCTATCAAGGAAGTCGCTAAGATTTACCAGTGGAATAAAAGTC 
AGCTCTACGCTGCCTACCACGACTGGGAAGAAAAACAATAA 



10 



20 



25 



MQIQKSFKGQSPYGKLYLVATPIGNLDDMTFRAIQTLKEVDWIAAEDTRhTTGLLLKHFDISTKQISFHEHNAK 
GFLKAGQSIAQVSDAGLPSISDPGHDLVKAAJEEEUVVTVPGASAGISALI^^ 

DYPETQIFYESPHRVADTLENMLEVYGDRSWLVRELTKIYEEYQRGTISELLESIAETPLKGECLL1VEGASQGVEEKDE 
EDLFVEIQTRIQQGVKKNQAIKEVAXIYQWNK5QLYAAYHDWEEKQZ 

IDn8 345bp 



ATGATAAAGAAAGGAAAGGGCTGTTTTATGGACAAAAAAGAATTATTTGACGCGCTGGATG 
TTATTGGTAACCTTAGCCGATGTGGAAGCCATCAAGAAAAATCTCAAGAGCCTGGTAGAGGAAAATACAGCTCTT 
CGCTTGGAAAATAGTAAGTTGCGAGAACGCTTGGGTGAGGTGGAAGCAGATGCTCCTGTCAAGGCCAAGCATGTT 
CGCGAAAGTGTCCGTCGTATTTACCGTGATGGATTTCACGTATGTAATGATTTTTATGGACAACGTCGAGAGCAGG 
ACG AAG AATGT ATGTTTTGTG ACG AGTTGTT AT A CAGGG AGT AA 

MIKKGKGCFMDKKELFDALDDFSQQLLVTLADVEAIKKNUCSLVEE^ALRLENSKLRERLGEVEADAPVKAKH 
33 VRR1YRDGFHVCNDFYGQRREQDEECMFCDELLYREZ 

!Ptl9 639bo 

ATGTCAJ^GGATTTTrAGTCTCTOT^ 

W - CCAATTTTAG AGGAA AAAGGAGTAGAGGTGTTGACGACCCGTGAACCTGGCGGAGTCTTGATTGGGGAGAAGATT 
CG GG AAGTG ATTTTGG ATCCAAGT C AT A CTC AG ATGG ATG CT AAAAC AG AG CTA CTTCTCT AT ATTG CC AGTCGCA 
G ACAGCATTTGGTGG AAAA AGTTCTTCCAGCCCTTG AAG CTGGCAAGTTGGTCATCATGG ATCGTTTT ATCG ATAG 
TrCTGTTGCCTATCAGGGATTTGGTC GTGG CTTAGATATTGAAGCCATTGACTGGCTCAATCAGTTTGCGACAGAT 
GGCCTCAAACCCGATTTGACACTCTATTrTGACATCGAGGTGGAAGAAGGGCTGGCTCGTATTGCTGCTAATAGTG 

40 ACCGCGAGGTTAATCGTTTGGATTTGGAAGGGTTGGACTTGCATAAAAAAGTTCGTCAAGGCTACCTTTCTCTTCT 
GGATAAAGAGGGAAATCGCATTGTCAAGATTGATGCTAGTCTCCCTrrGGAGCAAGTTGTGGAAACTACCAAGGC 
TGTCTTGTTTGACGGAATGGGCTTGGCCAAATGA 

MSKGFLVSLEGPEGAGKTSVLEAIXPILEEKGVEVLTTREPGGVU^ 
- 3U KVLPALEAGKLVIMDRFIDSSVAYQGFGRGLDIEAJDWLNQFATDGLKPDLTLYFDIEVEEGLARIAANSDREVNRLDL 
EGLDLHKKVRQGYLSLLDKEGNRIVKIDASLPLEQWETTKAVLFDGMGLAXZ 

ID120 408bo 

55 ATGGTAGAACAAAGAAAATCAATTACCATGAAAGATGTTGCTTTAGAAGCAGGAGTTACrrGTT^ 
CGTGTAATTAATAAAGAAAAAGGCArrAAAGAAGTAACTTTGAAAAAAGTGGAACAAGC^ 
TACATTCCAG ATTACT ACGCTAGAGGAATGAAAAAAAATCGAACAGAAACGATTGCAATCATTGTACCAAGTATC 
TGGCATCCli ilii 1 1 CAG AATTTGCT ATG C ATGTG G AAAATG AAGTCT AT A AG AG A AAT AA C A AATT A CTCTT AT 
GTTCTATCAATGGTACAAATAGAGAGCAAGACTATCTGGAGATGTTGCGTCATAATAAAGTTGATGGAGTGGTTG 

OU CCATTACCTATAGGCCAATTGAACATTACTTGACGTCAGGAATrCCCTTTGTTAGTATTGACCGCACATACTCAGA 
GATTGCCATTCCTTGTGTTTCA 



65 



MVEQRKSn^KDVALEAGVSVGTVSRVINKEKGIKEVTU^ 

SEFAMHVENEVYKRNNKLLLCSINGTNREQDYLEMLRHNKVDGVVAITYRP1EHYLTSGIPFVSIDRTYSEIAIPCVS 
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\P\2\ 28ft P 

ATGAATATATTTAGAACAAAGAATGTTAGTTTAGATAAAACAGAGATGCATAGGCATTTGAAGT^ 
ATTTTGCTGGGTATCGGAGCCATGGTAGGGACAGGCGTCTrTACAATCACAGGTACTGCAGCTGCAACACT^ 
D GCCCAGCCCTAGTGATTTCAATCGTTATTTCTGCCTTGTGTGTGGGATTATCAGCCCTCTTT^ 

TCGCGAGTACCCGCTACAGGAGGTGCCTATAGTTACCTCTATGCTATCTTAGGAGAATTCCCTGCCTGGTTGGCT 
GTTGGTTAACGATGATGGAGTTCATGACAGCCATATCAGGCGTAGCTTCGGGTTGGGCAGCTTATTT^ 

MNIFRTKNVSLDKTEMHRHUCLWDUIX^ 
10 ATGGAYSYLYA1LGEFPAWLAGWLTMMEFMTAISGVA5GWAAYF 

ID124 l?Ubp 

ATGAAATCAAGAGTAAAGGAAACGAGTATGGATAAAATTGTGGTTCAAGGTGGCGATAATCGTCTGGTAGGAAGC 
15 CTGACGATCGAGGGAGCAAAAAATGCAGTCrrACCCTTGTTGGCAGCGACTATrCTAGCAACT^ 
GTCTTGCAGAATGTTCCGATTTTGTCGGATGTCTTTAT^ 
ACTTrGATGAGGAAGCTCATCTTGTCAAGGTGGATGCT^ 

TCAGCAAGATGCGCGCCTCCATCGTTGTATTAGGGCCAATCCTTGCCCGTGTGGGTCATGCCAAGGTATCCATGCC 
AGGTGGTTGTACGATTGGTAGCCGTCCTATTGATCITCATrrGAAAGGTCTGGAAGCTATGGGGGTrAAGA 
CAGACAGCTGGTTACATCGAAGCCAAGGG^GAACGCTTCCATGGTGCTCATATCTATATGGACTTTCCAAGTGTT 
GTGCAACGCAGAACTrGATGATGGCAGCGACTCTGGCTGATGGGGTGACAGTGATTGAGAATGCTGCGCGTGAGC 
CTGAGATTGTTGACTTAGCCATTCTCCTTAATGAAATGGGAGCCAAGGTCAAAGGTGCTGGTACAGAGACTATAA 
CCATTACTGGTGTTGAGAAACTTCATGGTACGACTCACAATGTAGTCCAAGACCGTATCGAAGCAGG^ 
GGTAGCTGCTGCCATGACTGGTGGTGATGTCTTGATTCGAGACGCTGTCTGGGAGCACAACCGTCCCTTGATTGCC 
AAGTTACTTGAAATGGGTGTTGAAGTAATTGAAGAAGACGAAGGAATTCGTGTTCGTTCTCAACT 
AAAGCTGTTCATGTGAAAACCTTGCCCCACCCAGGATTTCCAACAGATATGCAGGCTCAATTTACAGCCTr 
CAGTTGCAAAAGGCGAATCAACCATGGTGGAGACAGTTTTCGAAAATCGTTTCCAAACCTAGAAGAGATGCGCCG 
CATGGGCTTGCATTCTGAGATTATCCGTGATACAGCTCGTATTGTTGGTGGACAGCCTTTGCAGGGAGCAGAAGTT 
CTTTCAACTGACCTTCGTGCCAGTGCGGCCTTGATTTTGACAGGTTTGGTAGCACAGGGAGAAACT 
JU AATTGGTTCACTTGGATAGAGGTTACTACGGmCCATGAGAAGTTGGCGCAGCTAGCTGCTAAGA 
TGAGG C AAGTG ATG AAG ATGAATAA 



20 



25 



35 



MKSRVKETSMDKIVVQ(K3DNRLVG$VTIEGAKNAV^ 
EEAHLVKVDATGDrTEEAPYKYVSKMRASIWLGPU^RVGHAKVSM 

AKAERl^GAHIYMDFPSVGATQNLMMAATI^DGVTVmNAAREPEIVDLAILLNEMGAKVKGAGTEm 
TTHWVQDRIEAGTFMVAAAMTGGDVURDAVWEHNRPUAKLLEMGVEVIEEDEGIRVRSQLENU^ 
GFPTDMQAQFTALMTVAKGESTWVETVFENRFQHLEEMRRMGLHSEURDTA 
TGLVAQGETVVGKLVHLDRGYYGFHEKLAQLGAKIQRJEASDEDEZ 

40 roi25 tioibo 

ATGTTATTAGCGTCAACAGTAGCCTTGTCATTTGCCCCAGTATTGGCAACTCAAGCAGAAGAAGTTCTTTGGACT 
C ACGT A GTGTTG A G C A AATCC AAA ACG ATTTG ACT AAAACG G AC AA C AAA A C AAGTT AT ACCGTAC AGTATGGTG 
ATACTTTG AGCA CCATTGCAGA AGCCT TGGGTGTAGATGTCACAGTGCTTGCGAATCTGAACAAAATCACTAATAT 

40 GGACrrGATTTTCCCAGAAACTCriTTTGACAACGACTGTCAATGAAGCAGAAGAAOT 

AACACCrCAAGCAGACTCTAGTGA AGAAG TGACAACTGCGACAGCAGATTTGACCACTAATCAAGTGACCGTTGA 
TGATCAAACTGTTCAGCriTGCAGACCnTTCTCAACCAATTGCAGAAGTTACAAAGACAGTGATTG 
GTGGCACCATCTACGGGCACTTCTGTCCCAGAGGAGCAAACGACCGAAACAACTCGCCCAGTTGCAGAAGAAGCT 
CCTCAGGAAACGACTCC^GCTGAGAAGCAGGAAACACAAACAAGCCCTCAAGCTGCATCAGCAGTGGAAGCAAC 

OU TACAACAAGTTCAGAAGCAAAAGAAGTAGCATCATCAAATGGAGCTACAGCAGCAGTTTCTACTTATCAACCAGA 
AGAAACGAAAGTAATTTCAACAACITACGAGGCTCCAGCrcCGCCCGATTATGCT 
TGAAAATGCAGGTCTTCAACCACAAACAGCTGCCrrrAAWGAAGAAATTGCTAACTTGTTT 
AGTGGTTATCGTCCAGGAGACACrrGGAGATCACGGAAAAGGTTTGGCTATCGACTTTATGGTACCAGAACm 
GAATTAGGGGATAAGATTGCGGAATATGCrATTCAAAATATGGCCAGCCGTGGCATTAGTTACATCATCTGGAAA 

00 CAACGTTTCTATGCTCCATTCGATAGCAAATATGGGCCAGCTAACACTTGGAACCCAATGCCAGACCGTGGTAGT 
GTGACAGAAAATCACTATGATCACGTTCACGTTTCAATGAATGGATAA 



60 



65 



MlXASTVAl^FAPVlj\TQA£EVLWTARSVEQIONDLTXTDNKTSYTVQYGDTLS^ 
IFPETVLTTTVNEAEEVTEVEIQTPQADSSEEVTTAT^^ 

VPEEQTTETTRPVAEEAPQETTPAEKQETQTSPQAASAVEATTOSEAKEVA^ 

APDYAGLAVAKSENAGLQPQTAAFKKKLLTCI^LHPLVVIVQ^ 

VALVTSSGNKVSMLHSIANMGQLTLGTQCQTVVVZQKn-MrrFTFQZMD 
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10 



20 



25 



TTGTT^GAAAAATAAAGACATT 
5^I G 2£? GACAG ™ m ^ 

5 AG £ A IE£ TCAK ^ 

GAC^GTQjAAGTTGGCCTATCATGTGACTGAGGCGTTGAAGATTACCTrACT 

TGTCCATCTTCGCTGGGAAAGAGATGATAGGACTTTTGGGGACGGAGAGGGATGTAGCTGAGA 
ATCTATCTTTGGTAGGCGGATQjATTGTTCTC^ 

I^ C ^ CGTCTGCC ^ CTATG ^^ 

TCrGGATATGGGGATAGCTGGTGTTGC TTGGG GGACAATTGTGTCTC 
AATTAAAACTGCCTTATGGGAAGCCAACTTTTGGTTTAGATAAGGAACTGTTGACCTTGG 
^ AGAGCGACTTATGATGAGGGCTCWAGATCrrAGTGATCATTGCmCKrrCGTrr 

^2^ TGC ^ TCGGAG ^ GTOTGACC ^ Gm 

CTGTTGGCCCGAGCAGTTGGAGAGGATGATTGGAAAAGAGTTGCTAGTTTGAGTAAACAAACCTT^ 
TCTTCCTCATGTTGCCCCTGTCCTTTAGTATATATGTCTTGGGTGTACCATTAACTCATCTCTAT^ 
CTAGCGGTGGAGGCTAGTGTTCrAOTGACACTCnTITC^CrACnTGGGACCCCTATGACG 
ATACGGCAGTCTGGCAGGGATTAGGAAATGCACGCCTCCCTTTTTATGCGACAAGTATAGGAATGTGCT 
GCATTGGGACAGGATATCTGATGGGGATTGTGCrTGGTTGGGGCTTGCCTGGTATTTGC^ 
TAATGGTTTTCGCTGGTTATTTCTACGCTATCGTTACCAGCGCTATATGAGCTTGAAAGGATAG 

^KKNKDILNIALPAMGENFLQMLMGMVDSYLVAHLGLIAISG^ 
YHVTEALKnXLLSFLLGFLSIFAGKEMIGUGTERW^ 

SNALNILFSSI^IFVLDMGUGVAWGT1VSRLVGLVILWSQLKLPYGKITFGLDKELLTLALPAAGERU4M 
LWSFGTEAVAGNAIGEVLTQFNYMPAFGVATATVMU^RAVGEDDWK^ 
LTHLYTTDSLAVEASVLVTLPSLLGTPMTTGTVrYTAV^ 
AGSLLDNGFRWLFLRYRYQRYMSUCGZ 

30 wir? 394bo 

GTGGGAAGAATTATCAGAGCAGGTGTAAAGATGGAACATCTTGGAAAAGTATTTCGTGAATTTCGAACAACT 
AATTATTCTTTAAAGGAAGCAGCAGGCGAATCCTGCTCTACCTCTCAGTTATCTCGCn^ 
. ACCTGGCAGTCTCCCUi i iui l J GAGATTTTGGATAACATTG\TGTAACAATCGAAAATTTG\TGGATAAGGCAAG 
33 GAATTTTCATAATCATGAACATGTGTCTATGATGGCACAGATTATC 

TTTC AAAAGCTTCAAAGAGAACAACTTGAAAAGTCTAAGAGTTCGACGACTCCCCTTTATTTT^ 

TTTTGCTACAmAGGTCTGATTTGTCAAAGAGATGCGAGTTATGATATGAAGCAGGATGATTTGGGTAAGGTAGC^ 

ATTATCTCTTCAAAj^CAGAAGAATGGACCATGTATGAGTTGATTCTTTTCGGTAACCTCT 

AGACTATGTCACTCGGATTGGTAGAGAAGTTATGGAGAGGGAGGAATTTTACCAAGAGATTAGTCGCCATAAGAG 
W ATTAGTGTTG ATTTTGGCCCTCAATTGTTACCAGCATTGTTTAG AGCATT C l ' I C I ' 1 ' 1 ' 1 ' 1 'AT AATGCCAACTATTTTG ^ 

AGGCTTATACAGAGAAGATTATTGACAAAGGTATTAAGCnTATGAGCGTAATGTTTTCCAT^ 
TGCCTTATATCAAAAAGGACAGTGTAAAGAAGGCTGTAAGCAGATGCAAGAGGCCATGCATATTTTTGATGTGTT 
AGGTCrrCCAGAGCAAGTAGCCTATTATCAGGAACACTACGAAAAATTTGTCAAAAGTTAA 

45 VG ^^GVKMEHLGKVFR£FRTSGNYSLKEAAGESCSTSQ 

HEHVSM^QIIPLYYSNDUGFQKLQREQLEKSKSSTTPLYFELNWIUjQGLICQRDASYDM 
WTMYEULFGNLYSFYDVDYVTRIGREVMEREEFYQEISRHKIU,VL^ 

klyernvfhylkgfalyqkgqckegckqmqeamhifdvlglpeqvayyqehyekfvksz 
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TABLE 3 
IP! HOftfop 

ATGTCTAAC ATTCA AAACATGTCCCTGGAGGACATCATGGGAGAGCGCTTrGGTCGCTACTC^ 

AAGACCGG GCTTT GCCAGATATTCGTGATGGGTTGAAGCCGGTTCAGCGCCGTATTCTTTATTCTA 

TAGCAATACI 1 1 I GACAAGAGCTACCGTAAGTCGGCCAAGTCAGTCGGGAACATCATGGGGAATTTCCACCCACA 

CGGGGATTCTTCTATCTATGATGCCATGGTTCGTATGTCACAGAACrGGAAAAATCGTGAGATTCTAGT^ 

CACGGTAATAACGGTTCTATGGACXjGAGATCCTCCTGCGGCTATGCGTTATACT 

CAGGCTACcricrrCAGGA TATCG AGAAAAAGACAGTTCCTTTTGCATGGAACTTT^ 

CAACGGTCTTGCCAGCAGCCTTTCCAAACXrrCTTGGTGKATGGTTCGACTGCGATTTCGGCTGGT^ 

CATTCCTCCCCATAATTTAGCTGAGGTCATAGATGCTGCAGTTTACATGATTGACCACCCAACTGCAAAGATTGAT 

AAACTCATGGAATTCTTGCCTCKIACCAGACTrCCCTACAGGGGCTATTATTCAGGGTCCT 

GCTTATGAGACTGGGAAAGGGCGCGTGGTTGTTCGTTCCAAGACTGAAATTGAAAAGCTAAAAGGTGGT 

CAAATCGTTATTATTGAGATTCCTTATGAAATCAATAAGGCCAAT 

ATA-ACAAGGTAGCTGGGATTGCTGAGGTTCGTGATGAGTCTGACCGTGATGGTCTTCGTATCGCTATCGAACTT^ 

GAAAGACGCTAATACTGAGCTTGTTCTCAACTACTTAmAAGTACACCGACCTACAAATCAACT^ 

ATGGTGGCGATTGACAATTTCACACCTCGTCACKnTGGATrGTTCCAATCCTOTCTAGCTATAT 

AGAAGTGA 

MSNIQNMSLEDtMGERFGRYSKYnQDRALPDIRIXjUCPVQRRILYSM^ 

SIYDAMVRMSQ^WKNREILVEMHGNNGSMDGDPP.\AMRYTEARi^EIAGYIXQDIEKKTVPF 

AAFPNLLVNGSTGISAGYATDlPPHNLA£VIDAAVYMn5HPTAKIDKLMEFL^^ 

VVRSKTEIEKLKGGKEQIVIIEIPYEINKANLVKKIDDVRV>^ 

YTDLQINYNFNMVAU5NFTPRQVGLFQSCLA1SLTVEK2 

TD12 684bp 

ATGCCGACATTAGAAATAGCACAAAAAAAACTGGAGTTCATTAAGAAGGCAGAAGAATATTACAATGCCTTGTCT 
ACAA AT AT AC A GTTG AG CG G AG AT A AACT AAAAGT AATTTCCGTTACTTCTGTT AACCCTGGGG AAGG AAAAAC A 
ACTACTTCCATAAATAT AGCATG GTCGTTTGCGCGTGCAGGCTATAAAACTCTTTTGATCGATGGCGATA 
ATTCAGTTATGTTAGGAGTTTTTAAATCTCGTGAAAAAATTACAGGGCT 
TTTATCTCACGGTITATGTGATA CAAAT ATTGAAAATTTATT^ 

AC AGC CTTGTT AC AAAGT AAA AA7TTT AATG AT ATG ATTG AAACATTGCGT AAAT ATTTTG ATT ATATC ATT ATTG 

ATACACCGCCTA7TGGAATTGTTATTGATGCGGCAATTATCACTCAAAAGTGTGATGCGTCCATCTTGCT 

AACAG GTGAG GCGAATAAACCrTGATATCCAAAAAGCGAAACAA(>ATTAAAACAAACAGGGAAACTCrrTCCrAG 

G AGTTGTTTT AAAT AAATTGGATATCTCGGTTAATAAGT ATG GAGTTTACGGTTCCTATGGAAAT^ 

ATAA 

MPTLEIAQKKLEFIKKAEEYYNALCTOTQLSGDKLKVISTO 
GVFKSREKrTGLTEFUiGTADLSHGLCDTNIENLFVVQSGSVSPNPTAIX 

ITTQKCDASILVTATGEANKRDIQKAKQQLKQTGKLaGWLNKLDISVNKYGVYGSYGNYGKKZ 
ID13 U82bp 

ATGGA GGCAAATATGAAACAT CTAAA AACATTTTACAAAAAATGGTrTCAATTATTACT 
1 1 nTAGTGGAGCCTTGGGTAGTTTrrCAATAACTCAACTAACT 

T AGT A CTATT ACACAAACTG CCTATAAGAACXjAAAATTCAACAACACAGGCTGTTAAC AAAGT AAAAG ATGCTGT 
TGTTTCTGTTATTACTTATTCGGCAAACAGACAAAATAGCGTATTTGCK^TGATG 

CGAATCrCTAGTGAAGGATCTGGAGTTATTTATAAAAAGAATGATAAAGAAGCTrACATCGTCACCAACAATCAC 
GTTA TTAA TGGCGCCAGCAAAGTAGATATTCGATTGTCAGATGGGACT AAAGT ACCTGGAGAAATTGTCGGAGCT 
GACACTTTCTCTGATATTGCrGTCCrrCAAAATCTCrTCAGAAAAAGTGACAA 
CTAAGTTAACTGTAGGAGAAACTGCTATTGCCATCGGTAGCCCGTTAGGTTCTG 

AGGTATCGTATCCAGTCTCAATAGAAATGTATCCTTAAAATCGGAAGATGGACAAGCTATTTCTACAAAAGCCAT 

CCAAACTGATACTGCTATTAACCCAGGTAACTCTGGCGGCCCACrGATCAATATTCAAGGGCAGGTTATCGGAAT 

TACCTCAAGTAAAATTGCTACAAATGGAGGAAC\TCTGTAGAAGGTCrTGCrmCGCAATTCCT^ 

ATCAAT ATT ATTG AACAGTTAGAAAAAAACGGAAAAGTGACGCGTCCAGCTTTGGGAATCC^ 

TCTAATGTGAGTACAAGC^ACATCAGAAGACrCAATArrcCAAGTAATGTTACATCTGGTGTAATTGTTCGTTCGG 

TACAAAGTAATATGCCTGCCAATG GTCACCTTGAAAAATACGATGTAATTAC AAAAGT AG ATG ACAAAG AG ATTG 

CTTCATCAACAGACTTACAAAGTGCTCTTTA(^ACCATTCrATCGGAGAC^ 

CGGGAAAGAAGAAACTACCTCTATCAAACTrAACAAGAGTTCAGGTGATTTAGAATCrTAA 

MEANMKHLKTFYKKWFQLLVVIVISFFSGALGSFSmjLTQKSSVNNSNNNSmQTAYKNENSTnjAVNKVKDAV^ 

rTYSANRQNSVFGNDDTDTDSQRISSEGSGVIYKKNDKEAYrVTNNHVlNGASKV^ 

VKISSEKVTTVAEFGDSSKLTVGETAIAIGSPLGSEYANmQGIVSSL^^ 
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UNIQCXJVIGITSSKIATNGOTSVEGLGFAIPANDAINIIE^ 

RSVQSNMPANGHLEKYDVnTCVDDKEMSSTDLQSALYNHSIGDTIKrTTYRNGKEETTSKUNKSSGDLESZ 
n)lS939bp 

ATGGCAGAAATTTATCTAGCAGGTGGTTGTITTO 

AAACCAGTGTTGGCTACGCTAATGCTCAAGTCGAAACGACCAATTACCAGTTGCTCAAGGAAACAGACCATGCAG 
AAACGGTCCAAGTGATTTACGATGAGAAGGAAGTGTCACTCAGAGAGATTTTACT^ 

TCCTCTATCTATCAATCAACAAGGGAATGACCGTGGTCGCCAATATCGAACTGGGATTTA1TATCAGGATGAAGC 

AGATTTGCCAGCTATCTACACAGTCKrrGCAGGAGCAGGAACGCATGCTGGGTCGAAAGATTGC^ 

GCAATTACGCCACTACATrCTGGCTGAAGACTACCACCAAGACTATCTCAGGAAGAATCCTTCAG 

ATCGATGTGACCGATGCTGATAAGCCATTGATTGATGCAGCAAACTATGAAAAGCCTAGTCAAGAGGTGTTGAAG 

GCCAGTCrATCTGAAGAGTCTTATCGTGTCACACAAGAAGCTGCTACAGAGGCTCCATTTACCAATGCCT 

AAACCTTTGAAGAGGG GATTTA TGTAGATATTACGACAGGTGAGCCACTCTTTTTTGCCAAG 

AGGTTGTGGTTGGCCAAGTTTTAGCCGTCCGATTTCCAAAGAGTTGAT^ 

ATGGAGCGAATTGAAGTTCGTTCTCGTTCAGGCAGTGCTCACTTGGGTCATGTTTTCACAGATGGACCGCGGGAGT 
TAGGCGGCCTCCGTTACTGTATCAATTCTGCTTCnTACGCTTTGTGGCCAAGGATGAGATGGAAAAAGCAGGATA 
TGGCTATCTATTGCCTTACTTAAACAAATAA 

^1AEIY^^GGCFWGL£EYFSRISGVLCTSVGYANGQVTOT 

NQ(^NDRGRQYRTGIYYQDEADLPAIYTVVQEQERMLGRKUVEVEQUIHYI^ 

DKPLIDAANYEKPSQEVLKASl^EESYRVTQEAATEAPFTNAYIXJTFEEGIYVDrrrGEPLFFAKDKFASGCGWPSFSRPI 
SKEUHYYKDLSHGMERlEVRSRSGSAHLGHVRDGPRELGGLRYCINSASLRi^AKDEMEKAGYGYLU'YLNKZ 

ID17 870bp 

ATGAAGATTATTGTACCTGCAACCAGTGCCAATATCGGGCG\GGTTTTGACTCGGTCGGTGTAGCTGTAACCAACT 
ATCTTCAAATTGAGGTCTGCGAAGAAC GAGA TGAGTCGCTGATTGAA 

ACGAGCGT AATCT CTTGCTCA AAAT CGCTTTGCAAATTGTACCAGACTTGCAACCAAGACGCTTGAA^ 
GTGATGTCCCTTTGGCGCGCGGTTTGGGTTCTTCCAGCT 

GGGTCAACTCAACrTATCAGACCATGAAAAATTGCAGTTAGCGACCAAGATTGAAGGGCATCCTGACAATGTGGC 

TCCAGCCATTTATGGTAATCTCGTTATTGCAAGTTCTGTTGAAGGGCAAGTCTCTGCTATCGTAGCAGA 

G AGTGTG ATTTTCTAGCTT A CATTCCAAACTATG AATTACGTACTCGCG ACAGCCGTAGTGTCTTG CCTAAAAAAT 

TGTCTTATAAGGAAGCTGTTGCTGCAAGTTCTATCGCCAATGTAGCGGTTGCTGCCTTGTTGGCAGGAGACATGGT 

GACCGCTGGGCAAGCAATCGAGGGAGACCTCTTCCATGAGCGCTATCGTCAGGACTTGGTAAGAGAATTTGCGAT 

GATTAAGCAAGTGACCAAAGAAAATGGGGCCTATGCAACCTACCTTTCTGGTGCTGGGCCGACAGTTATGGTTCT 

GGCTTCTCATGACAAGATGCCAACAATTAAGGCAGAATTGGAAAAGCAACCTTTCAAAGGAAAACTG^ 

GAGAGTTGATACCCAAGGTGTCCGTGTAGAAGCAAAATAA 

MKUVPATSANIGPGFDSVGVAVTTYLQIEVCEERDEWLIEHQIGKWIPHDERNLLmALQrVPDLQPRRLKNfnDVPLA 
RGLGSSSSVIVAGIELANQLGQLNLSDHEKLQLATKrEGHPDNVAPAIYGNLVIASSVEGQVSAIVADFPECDFLAYIPNY 
ELRTRDSRSVLPKKLSYKEAVAASSUNVAVAALLAGDMVTAGQAIEGDLFHERYRQDLVREFAM1KQVTKENGAYAT 
YLSGAGPTVMVLASHDKMPTIKAELEKQPFKGKLHDLRVDTQGVRVEAKZ 

ID20 S64bn 

ATGAAATATCACGATTACATCTGGGATTTAGGTGGAACTTTACTGGATAATTATGAAACTTCAACAGCT^ 

TTGAAACATTGGCACTCTATGGTATCACACAAGACCATGACAGTGTCTATCAAGCTTTAAAGGTTTCT 

TGCGATTG AGAC ATTCGCTCCC AATT TAGAGAATTTTTTAGAAAAG 

ACACCCGATTTTATTTGA AGGAG TTTCT GACC TATTGGAAGACATTTCAAATCAAGGTGGCCOT 
TCTCATCGAA ATGA TCAGOl 1 1 iGGAAATrTTAGAAAAAACCTCTATAGCAGCTTATTTTACAGAAGTGGTGAOT 
CTAGCTCAGGCTTTAAGAGAAAGCCAAATCCCGAATCCATGCTTTATTTAAGAGAAAACTATC^ 
GTCTTGTCATTGGTGATCGGCCGATTGATATCGAAG^ 
TATCGTGAATTTAAGACAAGTATTAGACATATAA 

MKYTIDYIWDLGGTLLDhfYETSTAAFVETLALYGrTQDHDSVYQ 

LFEGVSDUEDlSNQGGRHFLVSHRNDQVLEU^KTSIAAYFTEVVTSSSGFKRXPNPESMLYLREiCYQISSGLVIGDRPID 
lEAGQAAGLDTHLFTSlVNLRQVLDtZ 

n>21 1875hp 

ATGACAGAAGAAATCAAAAATCTGCAGGCACAGGATTATGATGCCAGTCAAATTCAAGTTTTAGAGGGCTTAGAG 
GCTGTTCGTATGCGTCCAGGGATGTACATTGGATCAACCTCAAAAGAAGGTCTTCACCATCTAGTCTGGGAAATTG 
TTGATAACTCAATTGACGAGGCCTTGGCAGGATTTGCCAGCCATAT^^ 

TACTGTrGTGGATGATGGGCGTGGTATCCCAGTCGATATTCAGGAAAAAACAGGCCGTCCTGCTGTTGAGACCGT 
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CTTTACAGTCCTTCACGCTGGAGGAAAGTTCGGCGGTGOT 
GTCGTCAGTAGTTAATGCCCTTTCCACTCAATTAGACGTTCATGT^ 
TACCGTCGTGGTCATGTTGTCGCAGATCTTGAAATAGTTGGAGATACGGATAAAAC^ 
ACACCGGACCCAAAAATCTTCACTGAAACAACAATCmGATTTTGATAAA'ITAAA 
3 GCCTTTCTAAATCGCGGTOTCAAATTTCAATTACAGATAAGCGCC^AGCrm 

ATGAAGGTGGGATTGCTAGTTACGTTGAATATATC>ACGAGAACAAGGATGTAATCTTTGATACACCAATCTATA 
GVGACGGTGAGATGGATGATATCACAGTTGAGGTAGCCATGCAGTACACAACTGGTTACCATGAAAATGTCATGA 
GTTTCGCCAATAATATTCATACCCATGAACGTGGAA<^CA 

CAACGATTATGCTCGTAAAAATAAGTTACTGAAAGACAATGAAGATAATTTAACAGGGGAAGATGTTCGCGAAGG 
1U CTTAACTGCAGTTATCTGAGTTAAACACCCAAATCCACAGTTTC 

CGAAGTGGTCAAGATTACCAATCGCCTCTTCAGTGAAGCTTTCTCCGATTTCCTCATGGAAAATCCACAG 

AAACGTATCGTAGAAAAAGGAATTTTGGCTGCCAAGGCTCGTGTGGCTGCCAAGCGTGCGCGTGAAGTCACACGT 

AAAAAATCTGGTTTGGAAATTTXTCAACCTTCCAGGGAAACTAGCAGACn'GTTCT^ 

AACTCTTCATCGTCGAA GGAG ACTCAGCTGGTGGATCAGCCAAA7CTGGTCGTAACCGTGAGTTTCAGGCTATCCT 

1 0 T CCAAT TCGCGGTAAGATTTrGAACGTTGAAAAAGCAAGTATGGATAAGATTCT 

TCTTTTCACAGCCATGGGAAG\GGATTTGGCGCAGAATTTGATGTTTCGAAAGCCCGTTACCAAAAACT 
ATGACCGATGCCGATGTCGATGGAGCCCACATTCGTACCCTTCTTTTAACCrTGATTTATCGTTATAT^ 
TCXTAGAAGCTGGTTATGTTTATATTGCCCAACCACCAATCTATGGTGTCAAGGTTGGAAGCGAGATTAAAGAATA 
TATCCAGCCGGGTGCAGATCAAGAAATCAAACTCCAAGAAGCTTTAGCCCGTTATAGTGAAGGTCGTACCAAACC 

ZV GACTATTCAGCGTTATAAGGGGCTAGGTGAAATGGACGATCATCAGCTGTGGGAAACAACCATGGATCCCGAACA 
TCGCTTGATGGCTAGAGTTTCTGTAGATGATGTGCAGAAGCAGATAAAATCTTTGATATGTTGA 

MTEElKNLQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKEGLHHLVWErV^ 

DGRGIPVDIQEKTGRPAVETVFTVLHAGGKFGGGGYKVSGGLHGVGSSVVNALSTQLDVHVHKNGKIHYQEYRRGHV 
ZD VADLEIVCDTDKTGTTVHFTPDPKIFTETTIFDFDKL^ 

NENKDVlFDTPIYTTKSEMDDrrVEVAMQYTTGYHENVMSFANNIHTHEGGTHEQGFRTALTRVINDYARK^ 
EDNLTGEDVREGLTAVISVKHPNPQFEGQTKTKLGNSEVWCrTNRL 

RAREVTRKKSGLEISNLPGKLADCSSNNPAETELFIVEGDSAGGSAKSGRNREFQAILPmGKILNVEKASMDKILAN^ 
RSLmMGTGFGAEFDVSKARYQKLVLNn-DADVDGAHIRTLLLTLIYRYMKPILEAGYVYlAQPPiYGVKVGSEDCEYI 
iU QPGADQEIKLQEAI^RYSEGRTKPTIQRYKGLGEMDDHQLWETTMDPEHRLMARVSVDDVQKQIKSUCZ 

IDS4 1446bp 

ATGAGTAGACGT TTTAAAA AATCACGTTCACAGAAAGTGAAGCGAAGTGTTAA 

JD TATTGTTAGTTTG 1 UUU UU'ATTGTTCTTAATCTTT AAGTACAATATCCTTGCTTTTAGATATCTTAATCTAGTGGTAA 

CTGCGTTAGTCCTACTAGTTGCCTTGGTAGGGCTACTCTTGATTATCTATA 
CTGTTGGTGTTCTCTATCCTTGTCAGCTCrGTGTCGCTCTTTGCAGTACAGCAGTTTGTTGGACTG 
AAATGCGACTTCTAATTACTCAGAATATTCAATCAGTGTCGCTGTTTTAGCAGATAGTGAGATCGAAAATGTTACG 
CAAO-GACGAGTGTGACAGCACCGACTGGGACTAATAATGAAAATATTCAGAAATTACTAGCTGATATCAAGTCA 

W AGTCAGAATACCGATTTGACGGTCAACCAGAGTTCGTCTTACTTGGCAGCTTACAAGAGTTTGATTGCAGGGGAG 
ACTAAGGCCATTGTCCTAAATAGTGTCTTTGAAAACATCATCGAGTCAGAGTATCCAGACTACGCATCGAAGATA 
AAAAAGATTTATACTAAGGGATTCACTAAAAAAGTAGAAGCTCCTAAGACGTCTAAGAGTCAGTCTTTC 
TATGTTAGTGGAATTGACACCTATGGTCCTATTAGrrCGGTGTCGCGATCAGATGTCAACATCCTGATGACTGTCA 
ATCGAGATACCAAGAAAATCCTCTTGACCACAACGCCACGTGATGCCTATGTACCAATCGCAGATGGTGGAAATA 

« ATCAAAAAGATAAATTGACTCATGCGGGCATTTATGGAGTTGATTCGTCCATTCACACCTTAGAAAATCTCTATGG 
AGTGGATATCAATTACTATGTGCGATTGAACTTCACTTCGTTTTTC 

TTTATAATGATC AAGA ATTTACTGCCCATACGAATGGAAAGTATTACCCTGCAGGCAATGTTCATCTTGATTCAGA 
ACAGGCTCTCGGTTTTGTTCGTGAGCGCTACTCCCTAGCAGATGGCGATCGTGACCGCGGGCGCCATCAACAAAA 
GGTGATTGTTOCTATCCTTCAAAAATTAACCn'CAACCGAAGTGCTGAAAAATTA 
- OK) CAAGATTCTATCCAAACAAATATGCCACTTGAGACCATGATAAATTTGGTCAATGCTCAGTTAGAAAGTGGAGGG 
AATTATAAAGTAAATTCTCAAGATTTAAAAGGGACAGGTCGGATGGATCTTCCTTCTTATGCAATGCCAGACAGTA 
ACCTCTATGTGATGGAAATAGATGATAGTAGTTTAGCTGTAGTTAAAGCAGCTATACAGGATGTGATGGAGGGTA 
GATGA 

55 MSRRFKKSItfQKVKRSVNIVLLTIYLLLVCFLLFLlTC^ 

tLVSSVSLFAVQQFVGLTNRLNATSNYSEYSISVAVI^DSEIENVT^ 

SSS YLAAYKSUAGETKAJVLNSVFENIIESEYPDYASKIKKIYTKGFrKKVEAPKT^ 
. NU^rTVNRDTKKILLTTTPRDAYVPlADGGNNQKDK^ 

IDVYNDQEFTAHTNGKYYPAGNVHU)SEQALGFVRERYSLAIXjDRDRGRHQQ 
OU ^IQTNMPU^^MlNLVNAQLESGG^^YKVNSQDIJCGTGRMDL 
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ATGATAGACATCCATTCGCATATCGTrrrTGATGTAGATGACGGTCCCAAGTCAAGAGAGGAAAGCAAGGCTCTC 
TTGGCAGAATCCTACAGACAGGGGGTGCGAACCATTGTTTCTACCTCTCACCGTCGCAAGGGCATGTTTGAAACT 
CGGAAGAGAAGATAGCAGAAAACTTTCTTCAGGTTCGGGAAATAGCTAAC^^ 

CTTACGGGGCTGAAATTTATTACACACCAGATGTTCTGGATAAGCTGGAAAAAAAGCGGATTCCGACCCT 
D ATAGTCGTTATGCCTTGATAGAGTTTAGTATGAACACTCCTTATCGCGATATTCATAGCGCCTTGAGCAAGATCTT 
GATGTTGGGAATTACTCCAGTCATTGCCCACATTGAGCGCTAT 

GAACTGATCGATATGGGCTGTTACACGCAAGTA^TAGTTCAC^TGTCCTCAAACCCA 
ATAAATTCATGAAAAAAAGAGCTCAGTATTTrrTAGAGCAGGATTTGGTTCATGTCATTGCAAGTGAT 
TCTAGACGCTAGACCTCCTCATATGGCAGAAGCATATGACCTTGTTACCCAAAAATACGGAGAAGCGAAGGCTCA 
1 U GGAACIT1 1"I ATAGACAATCCTCGAAAAATTGTAATGGATCAACTAATTTAG 

MIDIHSHIVFDVT)DGPKSREESKALI^ESYR(XjVRTWSTC 

YYTPDVLDKLEKKRIPTLNDSRYAUEFSMNTPYRDIHSAL^K^ 

NSSHVLKPKI^GERYKFMKKRAQYFl^QDL^ 

1j dquz 

ID58 3990bn 

OA TTGA TTTATATAATCGCTATCAATATAACAATGCAATCAGGAGGTTTTGCAATGAAACATGAAAAACAACAGCGT 
ZU TTTTCTATTCGTAAATACGCTGTAGGAGCAGCn'CTGTTCTAATTGGATTTGCCTTCCAAGCA 

CCGATGGAGTTACTCCTACTACTACAGAAAACCAACCGACCATCCATACGGTTTCTGATTCCCCTCAATCATCCGA 
AAATCGGACTGAGGAAACACCTAAAGCAGTGCTTCAACCAGAAGCTCCAAAAACTGTAGAAACAGAAACTCCAG 
CTACTGATAAGGTAGCTAGTCTTCCAAAAACAGAAGAAAAACCACAAGAGGAAGTTAGTTCAACTCCTAGTGATA 
AAGCAGAAGTGGTAACTCCAACTTCTGCTGAAAAAGAAACTGCTAATAAAAAGGCAGAAGAAGCTAGCCCT 
ZJ AAGG A AG AAG CG AA AG AGGTTG ATTCT AAAG AGTCAAAT ACA G ACAAG ACTG ACAAGG AT AA ACC AGCT AAAAA 

AGATGAAGCGAAAGCAGAG GCTGA CAAACCGGCAACAGAGGCAGGAAAGGAACGTGCTGCAACTGTAAATGAAA 

AACTAGCGAAAAAGAAjVATTGTTTCTATTGATGCTGGACGTAAATATTTCTCACCAGAACAGCTCAAG 

TCGATAAAGCGAAACATTATGGCTACACTGATTTACACCTATTAGTCGGAAATGATG 

CGATATGAGCATCACAGCTAACGGCAAGACCTATGCCAGTGACGATGTCAAACGCGCCATTGAAAAAGGTACAAA 
iU TGATTATTACAACGATCCAAACGGCAATCACTTAACAGAAAGTCAAATGACAGATCTGATTAACTATGCCAAAGA 
TAAAGGTATCGGTCTCATTCCGACAGT AAAT AGTCCTGGACACATGGATGCGATTCTCAATGCCATGAAAGAATT 
GGGAATCCAAAACCCTAACTTTAGCTATTTTGGGAAGAAATCAGCCCGTACTGT^ 
TGTCGCTTrrACAAAAGCCCTTATCGACAAGTATGCTGCrrATTTCGCGAAAAAGACTGAAATOT 
CTTCATGAATATGCCAATGATGCGACAGATGCTAAAGGTTGGAOT^^ 
JO GAAGGCTACCCTGTAA AAGGC TATGAAAAATTTATTGCCTACGCGAATGACCTCGCTCGTATTGTAAAATCGCA 
GGTCTCAAACCAATGGCTITTAACGACGGTATCTACTACAATAGCGACACAAGCT^ 
ATCATCGTTTCTATGTGGACTGGTGGTTGGGGAGGCTACGATGTCGCTTCTTCTAAACTACTAGCTGAAAA^ 
ACCAAATCCTTAATACCAATGATGCTTGGTACTACGTTCTTGGACGAAACGCTGATGGCCAAGGCTGGTACAATCT 
Af\ CGATCAGGGGCTCAATGGTATTAAAAAGACACCAATCACTTCTGTACCAAAAACAGAAGGAGCTGATATCCCAAT 
40 CATCG GTGG TATGGTAGCTGCTTGGGCTGACACTCCATCTGCACGTTATTCACCATCACGCCTCTTCAAACTC 

CGTCATTTTGCAAATGCCAACGCTGAATACTTCGCAGCTGATTATGAATCTGCAGAGCAAGCACrTAACGAGGTA 
CCAAAAGACCTGAACCGTTATACTGCAGAAAGCGTCACGGCCGTAAAAGAAGCTGAAAAAGCTATrCGCTCTCTC 
GATAGCAACCTTAGCCGTGCCCAACAAGATACGATTGATCAAGCCATTGCTAAACTrCAAGAAACTCT 
TTGACCCTCACGCCTGAAGCTCAAAAAGAAGAAGAAGCTAAACGTGAGGTTGAAAAACTTGCCAAAAACAAGGT 
4D AATCTCAATCGATGCTGGACGCAAATACTTTACTCTGAACCAGCTCAAACGCATCGTAGACAAGGCCAGTGAGCT 
CGGATATTCTGATGTCCATCTCCTTCTAGGAAATGACGGACTTCGCmCTACTCGATGATATGACC 
AACGGAAAAACCTATGCTAGTGATGACGTTAAAAAAGCTATTATCGAAGGAACTAAAGCTTACTACGACGATCCA 
AACGGTACTGCACrAACACAGGCAGAAGTAACAGAGCTAATTGAATACGCTAAATCTAAGGACATCGGTCTCATC 
CCAGC TATT AACAGTCCAGGTCACATGGATGCTATGCTGGTTGCCATGGAAAAATTAGGTATTAAAAATCCTCAA 
- 3U GCCCACTTTGATAAAGTTTC AAAA ACAACTATGGACTTGAAAAACGAAGAAGCGATGAACTTTCT 

ATCGGTAAATACATGGAC1 1 U III GCACKjrAAAACAAAGATTTTCAACTTTGGTACTGACGAATACGCCAACGAT 
GCGACTAGTGCCCAAGGCTGGTACTACCTC^ACrrGGTATCAACTCTATGGCAAATTrG 

CTCGCAGCTATGGCCAAAGAAAGAGGGCTTCAACCAATGGCCTTCAACGATGGCTTCTACTATGAAGACAAGGAC 
GATGTTCAGTTTGACAAAGATGTCTTGATTTCTTACTGGTCTAAAGGCTGGTGGGGATATAACCTCGCATCACCTC 

->J AATACCTAGCAAGCAAAGGCTATAAATTCTTGAATACCAACGGTGACTGGTACTACATTCTTGOT 

AAGATGGTGGTGGTTTCCTCAAGAAAGCTATTGAGAATACTGG AAAAA CACCATTCAATCAACTAGCTTCTA 
AATATCCTGAAGT AGATC TTCCAACAGTCGGAAGTATGCTTTCAATCTGGGCAGATAGACCAAGCGCTGAATACA 
AGGAAGAGGAAATCTTTGAACTCATGACTGCCTTTGCAGACCAC\ACAAAGACTACTTTCGTC 
CTCTCCGCGAAGAATTAGCTAAAATTCCTACAAACTTAGAAGGATATAGTAAAGAAAGTCTTGAGGCCCTTGACG 

OU CAGCTAAAACAGCTCTAAATTACAACCTCAACCGTAATAAACAAGCTGAGCTTGACACGOT 

AAGCCGCTCTTCAAGGCCTCAAACCAGCTGTAACTCATTCAGGAAGCCTAGATGAAAATGAAGTGGCT 
TTGAAACCAGACCAGAACTCATCACAAGAACTGAAGAAATTCCATTTGAAGTTATCAAGAAAGAAAATCCTAACC 
TCCCAGCCGGTCAGGAAAATATTATCACAGCAGGAGTCAAAGGTGAACGAACTCATTACATCTCTGTACTCACTG 
AAAATGGAAAAACAACAGAAACAGTCCTTGATAGCCAGGTAACCAAAGAAGTTATAAACCAAGTGGTTGAAGTr 

05 GGCGCTCCTGTAACTCACAAGGGTGATGAAAGTGGTCTTGCACCAACTACTGAGGTAAAACCTAGACTGGATATC 
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CAAGAAGAAGAAATTCCATTTACCACAGTGACTTCTGA^ 
ACTAAGGGCGTCAATGGACATCGTAGCAACTTCTACTCTGTGAGCACTTCT 

CTTGTAAATAGTGTCGTAGCACAGGAAGCCGTTACTGVAATAGTCGAAGTCGGAACTATGGTAAOVCATGTAGGC 
GATGAAAACGGACAAGCCGCTATTGCTGAAGAAAAACCAAAACTAGAAAra 
5 TGCrCCTGCTGAGGAAAGCAAAGTTCTrCCrCAAGATCCAGCTCCTGTCKrrAA 

AGGAACTTCACGATTCTGCAGGACTAGTAGTCGCAGGACrCATGTCCACACTAGCAGCCTATGGACTCACTAAAAG 
AAAAGAAGACTAA 

MryTIAINrTMQSGGFAMKHEKQQRPSIRKYAVGA 
1U TPKAVLQPEAPKTVETETPATDKVASLPKTEEKP^ 

SOTDKTDKDKPAKKDEAKAEADKPATE^^ 

VGNDGLRFMUDDMSn-ANGKTYA5DDVK]UIEKGTNDYYND 

AIl^AMKELGIQNPNFSYFGKKSARTVDLDNEQAVAFTKALIDKYAAYFAKCT 

DKYYP^YPVKGYEKmYANDLARIVKSHGUCPMAFNM 
1 J EKGHQIL>rrNDAWYYVU3RNAIX3<X5WYNLIX^LNGK^ 

MRHFANANAEYFAAJ3YESAEQALNEVPKDLNRYTAESVTAVKEAEKAIRSU)SN15RAQQDTIM 
LTPEAQKEEEAKREVEKI^NKVISIDAGRiCYF^^ 

SDDVKKAIIEGTKAYYDDPNCTALTQAEVTEUEYAK 
MDLKNEEAMNFVKAUGK YMDFFAGKTKIJWGTDEYAN^^ 
ZU QPMAFNDGFYYEDKDDVQFDKDVUSYWSKGWWGW 

EhrTGKTPFNQLASTKYPEVDLPTVGSMUIWADRPSAEY^ 

YSKESLEAU^AAKTALNYNLNRJTCQAELDTLVANL^ 

KKE^NLPAGQENirTAGVKGERTHYTSVLTENGKTTET^ 

DKJEEEIPFTTVTCENPLLUCGKTQVITKGVNGHRSNFYSVST^^ 

NGQAAIAEEKPKLEIPSQPAPSTAPAEESKVLPQDPAPVVTEK^ 

fP122 825bp 

ATGAACAAAAAAACAAGACAGACACTAATCGGACTGCTAGTGTTATTGCTTTTGTCTAC^ 
JU AAGCAGATGCCGTCGGCACCTAATAGTCCCAAAACCAATCTTAGTCAGAAAAAAG\AGCGTCTGAAGCTCCTAGT 
CAAGC ATTGG CAGAGAGTGTCTTAACAGACGCAGTG\AGAGTCAAATAAAGGGGAGTCTGGAGTGGAATGGCTC 
AGGTGCTTTTATCGTCAATGGTAATAAAACAAATCTAGATGCCAAGGTTTCAAGTAAGCCCrACGCT 
AACAAAGACAGTGGGCAAGGAAACTGTTCCAACCGTAGCTAATGCCCTCTTGT 

GAATCGTAAAGAAACTGGGAATGGTTCAACTTCTTGGACTCCTCCAGGTTGGCATCAGGTCAAGAATCT 
-) ^ CTCTT ATACCCATGCAGTCGATAG AGGTCATTTGm 

CAACAAGCAATCCTAAAAACATTGCTGTTCAGACAGCCTGGGCAAATCAGGCACAAGCCGAGTA 
AAAACTACTATGAAAGCAAGGTGCGTAAAGCCrTGGACCAAAACAAGCGTGTCCGTTACCGTGTAACC 
ACGCTTCAAACGAGGATTTAGTTCCCTCAGCTTCACAGATTGAAGCCAAGTCTTCGGATGG^ 
ATGTTCTAGTTCCCAATGTTCAAAAGGGACTTCAACTGGATTACCGAACTGGA 



40 



45 



MNKKTRQTLIGU-VUXl^GSYYKQMPSAPNSP 
^^G^^CTNLDAKVSSKPYAD^^CTKTVGKETVPTVANA1XSKATRQYKNR^ 
DRGHLLGYAUGGUX3FDASTSNPKNUVQTAWANQAQAEYSTC^ 
VPSASQIEAKSSDGELEFNVLVPNVQKGLQLDYRTGEVTVTQZ 

ID123 225bo 



GTGCTAAGATTCAGCGGA TTGAG GCAAGTGATGAAGATGAATAAGAAATCAAGCTACGT^ 
TTAGTCATCATAGTACTGATTTTAGGTACTCTGGCrCTAGGAATCGGTTTAATGGTAGGTTATC 
- AGGGTCAAGATCCATGGGCTATCCTGTCTCCAGCAAAATGGCAGGAATTGATTCATAAATTTACAGGAAATTAG 



VLRFSGLRQVMKMNKKSSYWKRUXVIIVULGTI^GIGLMVGYGI^ 



55 



SUBSTITUTE SHEET (RULE 26) 



WO 00/0673S PCT/GB99/02452 

69 

CLAIMS: 

1 . A Streptococcus pneumoniae protein or polypeptide having a sequence 
selected from those shown in table 1. 

2. A Streptococcus pneumoniae protein or polypeptide having a sequence 
selected from those shown in table 2. 

3. A protein or polypeptide as claimed in claim 1 or claim 2 provided in 
substantially pure form. 

4. A protein or polypeptide which is substantially identical to one defined in any 
one of claims 1 to 3. 

5. A homologue or derivative of a protein or polypeptide as defined in any one of 
claims 1 to 4. 

6. An antigenic and/or immunogenic fragment of a protein or polypeptide as 
defined 

in Tables 1-3. 

7. A nucleic acid molecule comprising or consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 1 or their RNA equivalents; 

(ii) a sequence which is complementary to any of the sequences of (i); 
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(iii) a sequence which codes for the same protein or polypeptide, as those 
sequences of (i) or (ii); 

(iv) a sequence which is substantially identical with any of those of (i), (ii) 
and (iii); 

(v) a sequence which codes for a homologue, derivative or fragment of a 
protein as defined in Table 1 . 

8. A nucleic acid molecule comprising or consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 2 or their RNA equivalents; 

(ii) a sequence which is complementary to any of the sequences of (i); 

(iii) a sequence which codes for the same protein or polypeptide, as those 
sequences of (i) or (ii); 

(iv) a sequence which is substantially identical with any of those of (i), (ii) 
and (iii); 

(v) a sequence which codes for a homologue, derivative or fragment of a 
protein as defined in Table 2. 

9. The use of a protein or polypeptide having a sequence selected from those 
shown in Tables 1-3, or homologues, derivatives and/or fragments thereof, as an 
immunogen and/or antigen. 
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10. An immunogenic and/or antigenic composition comprising one or more 
proteins or polypeptides selected from those whose sequences are shown in Tables 1- 
3, or homologues or derivatives thereof, and/or fragments of any of these. 

11. An immunogenic and/or antigenic composition as claimed in claim 10 which 
is a vaccine or is for use in a diagnostic assay. 

12. A vaccine as claimed in claim 1 1 which comprises one or more additional 
components selected from excipients, diluents, adjuvants or the like. 

13. A vaccine composition comprising one or more nucleic acid sequences as 
defined in Tables 1-3. 

14. A method for the detection/diagnosis of S.pneumoniae which comprises the 
step of bringing into contact a sample to be tested with at least one protein or 
polypeptide as defined in Tables 1-3, or homologue, derivative or fragment thereof. 

15. An antibody capable of binding to a protein or polypeptide as defined in 
Tables 1-3, or for a homologue, derivative or fragment thereof. 

16. An antibody as defined in claim 15 which is a monoclonal antibody. 

17. A method for the detection/diagnosis of S.pneumoniae which comprises the step 
of bringing into contact a sample to be tested and at least one antibody as define din 
claim 15 or claim 16. 

18. A method for the detection/diagnosis of S.pneumoniae which comprises the 
step of bringing into contact a sample to be tested with at least one nucleic acid 
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sequence as defined in claim 7 or claim 8. 

19. A method of determining whether a protein or polypeptide as defined in 
Tables 1-3 represents a potential anti-microbial target which comprises inactivating 
said protein or polypeptide and determining whether S. pneumoniae is still viable. 

20. The use of an agent capable of antagonising, inhibiting or otherwise 
interfering with the function or expression of a protein or polypeptide as defined in 
Tables 1-3 in the manufacture of a medicament for use in the treatment or 
prophylaxis of S.pneumoniae infection 
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